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ABSTRACT 


This  document  provides  a  detailed  description 
and  analysis  of  the  preprocessing  and  segmentation 
procedures  used  In  the  Vicens-Reddy  speech 
recognition  system. 
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1.  INTRODUCTION 

This  document  provides  a  detailed  description  of  the  preprocessing  and  segmentation 
procedures  used  in  the  Vicens-Reddy  speech  recognition  system  tl]*  In  addition, 
explanations  are  given  for  a  large  number  of  previously  unexplained  heuristics 
used  in  the  program.  In  particular,  special  attention  is  called  to  the  discussions 
of  the  functions  CL0S1  and  PROXIM. 

We  wish  to  thank  Gary  Goodman  of  the  Stanford  Artificial  Intelligence  Project  for 
his  help  in  furnishing  processed  speech  samples  and  program  listings  to  aid  in 
the  implementation  and  checkout  of  the  system  on  the  IBM  360/67. 

2.  PREPROCESSING 

Preprocessing  consists  of  dividing  the  speech  flow  into  minimal  segments  and 
determining  the  zero-crossing  count  and  the  maximum  amplitude  for  each  of  three 
different  frequency  bands  within  the  speech  spectrum.  The  frequency  bands  are: 

150  Hz  -  900  Hz,  900  Hz  -  2200  Hz,  and  2200  Hz  -  5000  Hz.  As  vowels  contain,  in 
general,  more  reliable  information  than  other  phonemes,  the  choice  of  the  cutoff 
values  was  dictated  by  known  parameter  values  for  the  vowels  as  developed  by 
Peterson  and  Barney  [2], 

2.1  AMPLITUDES 

Let  an  arbitrary  wave  within  a  minimal  segment  be  represented  by  a  discrete 

function  f^,  whose  values  are  the  ordinates  of  this  wave  at  equidistant  points. 

The  amplitude  of  the  wave  on  the  minimal  segment  is  then  defined  to  be* 

max  {f . }  -  min  (f . } 
lsi<n  IsiSn 

The  amplitude  is  thus  measured  from  the  lowest  peak  to  the  highest  peak,  where 
the  possible  range  is  from  0  to  -10  volts. 

Let  Al,  A2,  and  A3  denote  the  amplitudes  of  the  wave  on  a  minimal  segment  in 

the  frequency  bands  150  -  900  Hz,  900  -  2200  Hz,  and  2200  -  5000  Hz,  respectively. 


For  sine  waves,  this  expression  ie  proportional  to  the  square  root  of  the  average 
power  of  the  signal  over  the  minimal  segment  (see  Appendix) . 
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The  following  table  illustrates  the  A-to-D  conversion  for  amplitudes: 


Table  1.  P  to-D  Conversion  of  Amplitudes 


Actual 

Volts 

Raw  Binary 
from  A/D 

Output  of  A/D 
Conversion  Routine 

Amplitude 

(A1,A2,A3) 

into  A/D 
Converter 

Converter 
(2 ' s  Complement) 

Binary 

Octal 

Decimal 

0 

0 

100000 

000000 

000000 

0 

0 

-4.9 

111111 

min 

011111 

37 

31 

1/2  of  maximum 
possible  amplitude 

-5 

000000 

000000 

100000 

40 

32 

maximum 

possible  amplitude 

-10 

011111 

mm 

111111 

77 

63 

2.2  ZERO-CROSSINGS 

The  zero-crossings  of  the  wave  on  the  minimal  segment  are  the  number  of  sign 
changes  of  f^,  .  .  fn>  Let  Zl,  Z2,  and  Z3  denote  the  zero-crossing  counts  for 
the  three  freqnency  btnds  150  -  900  Hz,  900  -  2200  Hz,  and  2200  -  5000  Hz, 
respectively.  The  zero-crossing  count  is  normalized  into  7  bits. 
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The  following  table  illustrates  the  A-to-D  conversion  of  the  zero-crossings: 


Table  2.  A-to-D  Conversion  of  Zero-Crossings 


Actual 

Hz 

Zero- 
Crossings 
in  10 
Msec. 

Volts 

into 

A/D 

Converter 

Raw  Binary 
from  A/D 
Converter 
(2*s  Complement) 

Output  of  A/D 
Conversion  Routine 

Octal 

Decimal 

0 

0 

0 

100000 

000000 

0 

0 

-4.9 

111111 

min 

2500 

50 

-5 

000000 

000000 

62 

50 

5000 

100 

-10 

011111 

min 

144 

100 

Each  zero-crossing  unit  represents  50  Hz.  The  range  Is  from  0  -  100  corresponding 
to  0  Hz  -  5000  Hz.  The  Input  hardware  Is  adjusted  for  an  acceptance  level  of  0.03 
V  on  the  original  signal,  i.e.,  the  zero ^crossings  are  counted  only  if  the  amplitude 
of  the  original  signal  is  higher  than  0.03  V. 
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2.3  THE  PREPROCESSING  PROGRAM 

Originally,  the  Vioens-Reddy  program  had  a  main  driver  that  called  a  subroutine 
KLUDGE  to  run  the  recorder,  effect  the  A-to-D  conversion  (as  discussed  in 
Sections  2.1  and  2.2),  and  build  the  Q-matrix. 

The  KLUDGE  subroutine  computes  the  two  variables: 

MAXAMP  -  maximum  amplitude  of  all  Al,  A2,  and  A3  amplitudes  over  the  whole 
speech  sample,  and 

MAXAM1  -  maximum  of  all  Al  amplitudes  over  the  whole  speech  sample,* 
and,  depending  upon  their  relative  values,  the  following  actions  were  taken: 

Table  3.  "KLUDGE"  Table  for  MAIN  Package 


MAXAMP 

MAXAM1 

RESULT 

558 

558 

Message  returned  to  user:  "Get  a  bit 
closer,  please."  * 

>58 

>62 

No  normalization 

>58 

562 

Normalization  as  follows: 

Al  -  A1<64 

MAX AMI 

A2 • 64  For  all  Al.  A2.  and  A3 

^  MAXAM1  in  all  rows  of  O-matrix 

A3  -  A364 

MAXAM1 

(If  the  resulting  A2  or  A3  was  >127, 
it  was  resec  ■  127. ) 

* 


This  implies  that  MAXAKL  5  MAXAMP. 
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A  new  program  RECORD  was  later  written  which  contains  a  modified  subroutine  KLUDGE 
also  called  KLUDGE.  One  of  the  reasons  for  RECORD  was  to  free  the  user  from 
immediately  having  to  speak  after  the  recorder  was  started.  It  incorporates 
tests  for  determining  when  the  start  of  speech  occurs  and  terminates  when  there 
is  no  noticeable  input  for  more  than  320  msec.  RECORD  calls  KLUDGE  to  process 
each  live  message  being  recorded.  KLUDGE  detects  the  maximum  A1  amplitude,  and 
if  it  is  lass  than  50,  KLUDGE  returns  to  the  Ri  CORD  "weak  signal"  return.  The 
retu-n  is  presently  handled  as  if  the  signal  wet  »cronger  and  goes  on  to  the 
smoothing  process.  If  KLUDGE  detects  a  maximum  A1  amplitude  greater  than  50, 
which  indicates  a  possible  clipping  of  the  signal,  it  calls  CL0S1  to  compute  the 
first  closeness  values  (discussed  below),  returns  to  RECORD,  indicating  that 
clipping  has  occurred,  and  goes  on  to  smoothing. 

KLUDGE  determines  the  Al,  Zl,  A2,  Z2,  A3,  and  Z3  values  by  calling  program 
CHKMIC  which  -uns  under  the  Stanford  AI  Spacewar  mode.  CHKMIC  is  run  every 
l/60th  of  a  second  and  processes  a  maximum  of  3  minimal  segments  each  time  it 
runs.  The  start  of  the  input  is  determined  by  computing  the  sum 
Al  +  Zl  +  A2  +  Z2  +  A3  +  Z3/4 

for  successive  minimal  segments  and  comparing  it  to  10.  When  the  sum  is  equal 
to  or  greater  than  10,  the  input  buffer  is  backed  up  7  segments  and  that  is  the 
start  of  the  first  Q-matrix  minimal  segment.  When  CHKMIC  detects  a  group  of  more 
than  32  minimal  segments  (320  msec)  without  an  individual  sum  exceeding  9,  it 
s tops  recording.  If  the  number  of  acceptable  O-matrix  segments  is  less  than  or 
equal  to  20,  it  considers  the  speech  sample  too  small  and  goes  back  to  a  sub¬ 
routine  RETRY  to  re-initiate  the  recording.  Otherwise,  CHKMIC  finds  the  maximum 
Al  amplitude  after  doing  the  A-D  conversion. 

RECORD  smoothes  Al,  Zl ,  A2,  Z2,  A3,  and  Z3  as  follows:  For  each  of  these  six 
variables,  three  adjacent  values  (called  FIRST,  SECOND,  and  THIRD  in  the 
program)  are  considered.  If 

(SECOND-FIRST)-  (TH I RI>- SECOND)  :0 
then  the  second  value  is  smoothed  as  follows: 


SECOND 


FIRST+SECONDfTHIRD  .  r 

3  3 
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On  the  other  hand,  if 

(SECOND-FIRST)- (THIRD-SECOND)  *0 
then  no  smoothing  occurs. 


The  maximum  amplitude  fo?  each  frequency  band 
MAXA1  -  maximum  amplitude  in  range 
MAXA3  -  maximum  amplitude  in  range 
MAXA5  -  maximum  amplitude  In  range 


Is  then  defined  as  follows: 
150  -  900  Hz 
900  -  2200  Hz 
2200  -  5000  Hz 


Let 


M  -  max  (MAXA1,  MAXA3 ,  MAXA5} 
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Depending  upon  the  value  of  M,  one  of  the  actions  shown  in  Table  4  is  taken: 

Table  4.  Modified  "KLUDGE"  Table 


Range  of  M 

Result 

M  <  30 

Compute 

FMAXA  -  max  {MAXA1,  MAXA3} 
and  2  v 

AT  FAC  -  1  IFIX  {i  log 

where  IFIX(V)  -  integer  part  of  V,  and  A  -  average  of  A  -  31. 
User  receives  the  response  "Raise  the  volume 
by  about  <ATFAC>." 

30  S  M  <  62 

Normalization  by  RECORD  as  follows: 

A1  -  A1'63  , 

MAXA1 

A>  .  A2-63 

MAXA1  ’ 

A3  -  A3’63  • 

MAXA1 

M  £  62 

User  receives  the  response  "Signal  is  clipping"  and 

RECORD  normalizes  the  amplitudes  as  follows: 

Al-63 
"  MAXA1  ’ 

A2  -  A2'63  ^ 

MAXA1 

A3  .  A3_-  63  _ 

J  MAXA1 

System  Development  Corporation  JL 
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The  old  normalizing  routine  used  with  MAIN  and  its  accompanying  KLUDGE  checked 

the  A2  and  A3  values  to  make  sure  they  did  not  exceed  7  bits  (i.e.,  127)  after 

normalizing.  The  new  normalization  routine  used  by  RECORD  and  its  accompanying 

KLUDGE  does  not  make  this  check  after  normalizing.  It  appears  possible  (but 

A1  A2  A3 

highly  unlikely)  that  if  any  one  of  the  ratios  ^rr^,  is  >2, 

then  the  renormalized  A2  and  A3  values  would  exceed  7  bits. 


3 .  SEGMENTATION 


3.1  THE  Q-MATRIX 

The  resulting  digitized  data  are  arranged  in  an  array,  called  the  Q-matrixf 
comprie'  i  of  7  columns  and  500  rows .  Each  row  represents  a  minimal  segment 
(10  msec)  of  speech  data;  row  i  of  Q  will  be  referred  to  as  Q(i).  A  maximum 
of  5000  msec  ■  5  sec  of  speech  is  thus  allowable.  Let  Q  »  (q^)»  where 

i  -  1 .  500  and  j  *  1,  ...  ,7.  Then  the  7  columns  of  Q  are  defined  as 

follows : 

q.  ■  AlQ(i)  ■  amplitude  in  the  range  150  Hz  -  900  Hz  for  0(i) 

q  -  ZlQ(i)  ■  zero-crossings  in  the  range  150  Hz  -  900  Hz  for  Q(i) 

1 ,  z 

q.  ,  ■  A2Q(i)  ■  amplitude  in  the  range  900  Hz  -  2200  Hz  for  0(i) 

1  ,  J 

q  ,  *  Z2Q(i)  »  zero- crossings  in  the  range  900  Hz  -  2200  Hz  for  0(i) 

1,4 

q  _  -  A3Q(i)  “  amplitude  in  the  range  2200  Hz  -  5000  Hz  for  Q(i) 

q.  ,  *  Z3Q(i)  ■  zero -crossings  in  the  range  2200  Hz  -  5000  Hz  for  0(i) 

1,0 

q.  _  «  CLOVAL(i)  ■  closeness  value  (to  be  defined  below) 

1  %  ' 


c 
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The  matrix  Q  thus  has  the  following  appearance: 


Table  5.  The  Q-Matrix 

Data  for  Q(l): 

A1Q(1) 

Z1Q(1)  A2Q(1)  Z2Q(1) 

A3Q(1) 

Z3Q(1) 

CLOVAL(l) 

Data  for  Q(2): 

A1Q(2) 

• 

Z1Q(2)  A2Q(2)  Z2Q(2) 

•  •  • 

A3Q(2) 

Z30(2) 

• 

CL0VAL(2) 

• 

• 

• 

Data  for  Q(500): 

• 

• 

A1Q(500) 

•  •  •  • 

•  •  •  • 

ZlQ(500i)  A2Q(500)  Z2Q(500)  A3Q(500) 

• 

• 

Z3Q(500) 

• 

• 

CLOVAL(500) 

By  convention,  CLOVAL(l)  -  CL0VAL(500)  -  0. 


The  first  computation  of  CLOVAL(i)  occurs  by  the  main  program  which  calls  sub¬ 
routine  CLOS1  (described  in  Section  3.6.1)  before  the  actual  segmentation 
subroutine  is  entered.  The  first  computation  of  CLOVAL(i)  is  based  on  computing 
the  closeness  between  Q^i-l)  and  0(i+l)  for  all  segments  in  the  Q-matrix.* 


This  appears  to  be  at  odds  with  statements  in  [1]  such  as  "The  Purpose  of  the 
primary  segmentation  procedure  is  to  group  together  similar  adjacent  minimal 
segments  which  are  produced  by  the  preprocessing  procedure."  However,  Reddy  [3] 
(an  interesting  discussion  of  closeness  and  tolerance  interval  on  pages  43-47) 
has  said:  "It  might  so  happen  that  the  choice  of  a  minimal  segment  is  such 
that  it  falls  between  two  peaks  of  the  speech  wave  (i.e.,  between  two  pitch 
periods).  One  way  to  correct  this  would  be  to  choose  a  minimal  segment  interval 
so  that  it  will  include  at  least  one  pitch  period  whenever  voicing  is  present. 
However,  this  will  decrease  the  precision  with  which  segmentation  may  be 
achieved.  This  difficulty  may  also  occur  due  to  the  irregularities  of  the 
vocal  apparatus.  It  is  corrected  by  using  the  rule  that  if  two  segments  are 
about  the  same  intensity  level  and  have  about  the  same  number  of  zero  crossings 
and  if  they  are  sufficiently  close  (20  ms)  to  each  other,  although  not  adjacent, 
they  can  be  grouped  to  form  one  segment." 
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3.2  PRIMARY  SEGMENTATION 

Primary  segmentation  groups  together  similar  minimal  segments  (on  the  basis  of 
the  closeness  values)  and  labels  them  "sustained"  or  "transitional."  Sustained 
segments  consist  of  a  string  jf  minimal  adjacent  segments  having  a  positive 
closeness  value  and  include  as  their  first  minimal  segment  the  previously  adjacent 
minimal  segment  with  a  negative  closeness  value.  If  the  first  Q-matrix  minimal 
segment  begins  with  a  positive  closeness  value  (which  is  almost  always  the  case), 
the  first  sustained  segment  will  consist  of  only  adjacent  segments  with  positive 
closeness  values.  All  other  minimal  segments  not  a  part  of  sustained  segments 
become  grouped  into  transitional  segments. 


The  rationale  for  this  ...ethod  of  grouping  is  the  way  in  which  the  first  closeness 
value  is  computed.  A  negative  closeness  for  0(i)  indicates  a  lack  of  closeness 
between  Q(i-l)  and  Q(i+1).  If  the  negative  closeness  of  0(i)  is  followed  by  a 
positive  closeness  for  Q(i+1),  this  indicates  a  closeness  between  Q(i)  and  Q(i+2). 
The  beginning  of  the  sustained  segment  is  taken  to  be  at  D(i) .  Also,  the  last 
Q(i)  in  a  string  of  adjacent  segments  with  positive  closeness  valuer  indicates 
a  closeness  between  Q(i-l)  and  Q(i+1).  The  negative  closeness  value  for  Q(i+1) 
indicates  a  lack  of  closeness  between  Q(i)  and  Q(i+2).  The  end  of  the  sustained 
segment  is  taken  to  be  Q(i).  The  combining  rules  are  summarized  in  the  following 
two  diagrams : 


Closeness 

Segment  if  Value 


transitional 

segment 


sustained 

segment 


Closeness 

Segment  //  Value 


sustained  / 
segment  ' 


i-1 

i 


+ 

+ 


transitional 
segment 


en?1  ["" 


i+1 

i+2 


( 


0 
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3.3  THE  P-MATRIX 

The  composite  segments  resulting  from  the  above  primary  segmentation  process  lead 
to  the  construction  of  a  new  array,  called  the  P-matrix,  which  contains  25  columns 
and  200  rows.  Each  row  of  the  P-matrix  contains  data  relating  to  a  segment  either 
minimal  or  larger;  row  1  of  P  will  be  referred  to  as  P(i).  An  arbitrary  row  of  P 
will  be  called  a  P-segment.  Let  P  ■  ) •  where  1  ■  1,  ...  ,  200  and 

j  ■  1,  ...  ,  25.  Then  the  25  columns  of  P  are  described  as  follows: 


PM  -  SBG(i) : 


Pi,2  "  |  SND(1)  l  : 
(type(i)  ) 


P1>3  -  DUR(i) : 


Pt  4  -  AlMN(i) : 


The  variable  SBG(i)  +  1  points  to  the  beginning 
minimal  segment  in  the  0-matrix  that  identifies 
the  start  of  the  larger  P-matrix  segment. 

This  column  is  used  for  successive  storage  of 
the  two  variables  as  shown;  initially,  p^  ^ 
is  filled  with  SND(i)  as  used  in  subroutine 
SEGMENT  to  point  to  the  ending  minimal  segment 
in  the  Q-matrix  that  identifies  the  end  of  the 
larger  P-matrix  segment.  Later,  SND(i)  is 
replaced  by  TYPE(i),  which  is  used  in  the 
recognition  subroutine  to  define  vowel,  burst, 
consonant,  stop,  etc. 

Number  of  minimal  segments  in  the  P-matrix  segment; 
and  also,  because  each  minlmul  segment  is  10  ms  in 
length,  DUR(i)  is  the  time  duration  of  the  segment 
in  10  ms  units. 

The  minimum  amplitude  in  the  range  150  Hz  -  900  Hz 
of  all  minimal  segments  making  up  the  larger 
P-matrix  segment. 
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P1>5  -  Al(l): 


Pl6  -  A1MX(1) : 


Pj>7  -  Z1MN(1) : 

Pi, 8  " 


The  average  amplitude  In  the  range  150  Hz  -  900  Hz 
of  all  minimal  segments  making  up  the  larger 
P-matrix  segment,  computed  as  follows: 

Let 

JIB  -  SBG(l)  +  1  +  BETA, 

where  BETA  -  0  for  the  Initial  construction  of  the 
P-matrlx,*  and  let 

JlE  -  SND(i)  -  BETA. 

Then 


Al(i) 


J1E 

X>iq(j)  + 

-JIB 


J1E  -  JIB 


J1E  -  JIB  +  1 


The  maximum  amplitude  in  the  range  150  Hz  -  900  Hz 
of  all  minimal  segments  making  up  the  larger  P-matrix 
segment. 

The  minimum  zero-crossing  count  in  the  range  150 
Hz  -  900  Hz  of  all  minimal  segments  making  up  the 
larger  P-matrix  segment. 

The  average  zero-crossing  count  in  the  range  150 
Hz  -  900  Hz  of  all  minimal  segments  making  up  the 
larger  P-matrix  segment,  computed  as  follows: 

Let  JIB  and  J1E  be  defined  as  above.  Then 


Zl(i) 


1-JIB _ 

J1E  -  JIB  +  1 


P^  g  ■  ZlMX(i) :  The  maximum  zero-crossing  count  in  the  range  150 

Hz  -  900  Hz  of  all  minimal  segments  making  up  the 
larger  P-matrix  segment. 

Pi  iq  "  A2MN(i):  The  minmum  amplitude  in  the  range  900  Hz  -  2200  Hz 

of  all  minimal  segments  making  up  the  larger  P-matrix 
segment. 

P 

Later  modifications  of  the  P-matrix  will  require  other  (possibly  non-zero) 
values  of  BETA, and  these  calculations  will  be  described  in  a  later  section 
on  the  CREAT2  and  CRFAT4  subroutines. 
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Pi, 11  “  A2(1): 

P1>12  -  A2MX(i) : 
P±  13  “  Z2MN(i): 

pi,14  “  Z2(1): 

P±  15  -  Z2MX(i) : 
pi,l6  “  A3MN(i): 
pi,17-  A3<1): 

pi,18  “  A31®*1*5 
Pljl9  -  Z3MN(i) : 


The  average  amplitude  in  the  range  900  Hz  -  2200  Hz 
of  all  minimal  segments  making  up  the  larger  P-matrix 
segment;  A2(i)  is  computed  in  the  same  manner  as 
Al(i),  with  A20(j)  replacing  A10(j). 

The  maximum  amplitude  in  the  range  900  Hz  -  2200  Hz 
of  all  minimal  segments  making  up  the  larger  P-matrix 
segment. 

The  minimum  zero-crossing  count  in  the  range  900 
Hz  -  2200  Hz  of  all  minimal  segments  making  up  the 
larger  P-matrix  segment. 

The  average  zero-crossing  count  in  the  range  9C0 
Hz  -  2200  Hz  of  all  minimal  segments  making  up  the 
larger  P-matrix  segment;  Z2(i)  is  computed  in  the 
same  manner  as  Zl(i),  with  Z20(j)  replacing  Z10(j). 
The  maximum  zero -crossing  count  in  the  range  900 
Hz  -  2200  Hz  of  all  minimal  segments  making  up  the 
larger  P-matrix  segment. 

The  minimum  amplitude  in  the  range  2200  Hz  -  5000 
Hz  of  all  minimal  segments  making  up  the  larger 
P-matrix  segment. 

The  average  amplitude  in  the  range  2200  Hz  -  5000 
Hz  of  all  minimal  segments  making  up  the  larger 
P-matrix  segment;  A3(i)  is  computed  in  the  same 
manner  as  Al(i),  with  A30(j)  replacing  AlQ(j). 

The  maximum  amplitude  in  the  range  2200  Hz  -  5000 
Hz  of  all  minimal  segments  making  up  the  larger 
P-matrix  segment. 

The  minimum  zero-crossing  count  in  the  range  2200 
Hz  -  5000  Hz  of  all  minimal  segments  making  up  the 
larger  P-matrix  segment. 
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pi,20 


Z3(i) : 


Pi21  -  Z3MX(i) : 


23  m  SXT(i) • 


pi,24 

pi,25 


BPT(i) : 
NAT(i) : 


The  average  zero -crossing  count  in  the  range  2200 
Hz  -  5000  Hz  of  all  minimal  segments  making  up  the 
larger  P-matrix  segment;  Z3(i)  is  computed  in  the 
same  manner  as  Zl(i),  with  Z3Q(j)  replacing  ZlQ(j). 
The  maximum  zero- crossing  count  in  the  range  2200 
Hz  -  5000  Hz  of  all  minimal  segments  making  up  the 
larger  P-matrix  segment. 

This  column  is  used  for  successive  storage  of  the 
two  variables  as  shown;  initially,  22  *8  filled 
with  MK(i),  which  is  a  logical  marker  in  looking 
for  a  parameter  with  large  variation  within  a 
P-segment.  MK(i)  »  .TRUE,  if  there  is  a  variable 
parameter  such  that  one  of  the  Q-segments  within 
the  P-segment  has  a  closeness  value  (CLOVAL)  £  7. 
Later,  MK(i)  is  replaced  by  CLO(i),  a  measure  of 
the  closeness  between  P-segments  an  calculated 
by  function  PROXIM  (see  Section  3.6.2). 

SXT(i)  ■  0  if  the  segment  is  not  a  local  minimum 
or  maximum  as  determined  by  pseudo-subroutine  MINMAX, 
SXT(i)  -  1  if  the  segment  is  a  local  maximum, 

SXT(i)  ■  -1  if  the  segment  is  a  local  minimum. 

The  logical  pointer  for  the  physical  row  P(i). 

The  description  of  the  segment,  i.e.,  SUST 
(sustained)  or  TR  (transitional) . 


The  initial  configuration  of  the  P-matrix  is  shown  iu  Table  6. 


ft 
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Initial  Configuration  of  the  P-Matrix 


Data  for  P(l): 


Data  for  P (2) : 


SBG(l) 

SND(l) 

DUR(l) 

A1MN(1) 

Al(l) 

A1MX  v'  1 ) 

Z1MN(1) 

Zl(l) 

Z1MX(1) 

A2MN(1) 

SBC(2) 

SND(2) 

DUR(2) 

A1MN(2) 

Al(2) 

*1MX(2) 

Z1MN(2) 

Z1  (2) 

Z1MX(2) 

A2MN(2) 

Data  for  P(200): 


|__SBG(200)  SND(200)  DUR(200)  A1MN(200)  Al(200)  A1MX(200)  Z1MN(200)  Z1  (200) 


Z1MX(200)  A2MN(200)  A2(2C 


(1) 

Z2MN ( 1 ) 

Z2(l)  Z2MX(1) 

A3MN(1) 

A3  ( 1 ) 

A3HX( 1 ) 

Z  3MN ( 1 ) 

Z3(l) 

7)WX(1) 

MK(1) 

SXT(l) 

BPT(l) 

NAT ( 1 ) 

!2) 

Z2MN(2) 

Z2(2)  Z2KX(2) 

A3MN(2) 

A3(2) 

A  3MX ( 2  ) 

Z  3MN  ( 2  ) 

Z3(2) 

Z3MX(2) 

MX(2) 

SXT(2) 

BPT(2) 

NAT ( 2 ) 

200)  Z2MN(200)  7.2(200)  Z2MX(200)  AJMN(200)  A3(200)  A3MX(200)  Z3MN(200)  Z3(?0O)  Z3MX(200)  MK(200)  SXT(200)  BPT(2"0)  NAT(200) 
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3.4  SECONDARY  SEGMENTATION 

Once  the  P-matrix  has  been  constructed,  it  is  subjected  to  various  modifications. 
These  changes  include  the  suppression  of  so-called  "noisy"  segments  and  a 
refinement  of  the  segmentation  of  the  speech  sample  by  checking  intra-segment 
variation. 


3.4.1  Suppression  of  Noisy  Segments 

The  secondary  segmentation  routine  suppresses  the  noisy  segments  at  the  beginning 
and  end  of  the  speech  utterance.  Noisy  segments  are  defined  as  being  those 
adjacent  segments  from  the  beginning  minimal  segment  forward  for  whic.  Al^Xd)  < 

8  or  Z3'MX(i)  £  40;  or  those  adjacent  segments  from  the  ending  minimal  segment 
backwards  for  which  AlMX(i;  <  8  or  Z3MX(i)  £  40. 

3.4.2  Intra-Segment  Variation 

All  sustained  segments  are  checked  for  internal  variations  and  broken  down  into 
smaller  segments  if  necessary.  This  is  done  by  using  the  function  VARFUN 
(Section  3.7)  to  check  the  variation  within  a  row  of  the  P-matrix  and  to  flag 
the  most  variable  parameter,  if  any.  New  closeness  indices  are  then  computed  for 
the  Q-matrix  segments  making  up  the  P-matrix  segment  with  an  increased  weight 
.or  the  most  variable  parameter.  To  accomplish  this,  function  CL0S1  is  called  to 
compute  the  value  of  CLOVAL(j),  which  is  computed  as  the  closeness  value  between 
Q ( J — 2 )  and  Q(j+1),  beginning  with  j  -  SBG(i)  +  1,  and  ending  with  j  -  SND(i) ; 
however,  if  J-2  <  1,  we  set  j-2  “  1  and  if  jM  >  SIZEP  (the  number  of  segments 
in  the  P-matrix),  we  set  j+1  ■  SIZEP. 

If  there  exists  a  segment  0(j)  within  P(i)  for  j  -  SBG(i)  +  2*  to  j  -  SND(i)  for 

which  CLOVAL(j)  5  7,  P(i)  is  subdivided.  Any  newly  created  P-matrix  sustained 

segments  are  then  checked  for  internal  variation  again,  and  the  above  process  is 

repeated  until  CLOVAL(j)  >  7  for  all  Q(j)  within  P(i)  from  j  -  SBG(i)  +  2  to 
j  -  SND(i) . 


*A1 though  the  previous  calculations  treat  0(j)  beginning  with  j  -  SBG(i)  +  1,  this 
set  of  tests  begins  with  j  ■  SBG(i)  +  2. 
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If,  after  processing  all  P(i)  from  1  ■  1  to  1  ■  SIZEP  In  the  above  manner,  It  has 
been  determined  that  at  least  one  P(l)  has  Internal  variation,  the  entire 
P-matrlx  Is  recompacted  Into  new  P-segmenrs,  and  the  program  again  starts  checking 
all  sustained  P-segments  for  Internal  parameter  variation,  etc.  The  process  ends 
when  the  entire  P-matrlx  Is  searched  for  sustained  segments  with  Internal  variation 
and  none  are  found. 

The  P-matrlx  Is  then  sorted  Into  sequential  order;  noisy  segments  at  the  beginning 
and  end  are  suppressed  as  above;  the  P-matrlx  Is  compacted  and  pointers  are 
assigned  as  follows: 

INUSE(i):  Points  to  the  physical  P-matrlx  row  number  (see  Section  3.9) 

BPT(l):  Points  to  the  logical  P-matrlx  row  number  (see  Section  3.9) 

Finally,  subroutine  MINMAX  Is  applied  to  each  segment  to  determine  If  It  Is  a 
local  minimum  or  maximum,  as  defined  below. 

3.4.3  Pseudo»Sub routine*  MINMAX  and  the  Creation  of  New  Sustained  Segments 

In  order  to  Identify  P-segments  which  contain  a  local  maximum  or  a  local  minimum, 
we  proceed  as  follows: 

Let  J  ■  TNUSE(l)  as  given  above,  and  define 

SCRAT(l)  -  2-Al(j)  +  A2(j)  +  AKiL±_l  +  dUR(J) 
for  all  i  beginning  with  1  ■  1  and  ending  with  1  ■  SIZEP. 

SCRAT(l)  can  be  Interpreted  as  a  smoothing  process  on  the  values  Al(j),  A2(j), 
and  A3(j).  The  weights  2,  1,  and  -j  can  be  justified  by  observing  that  Al(j) 

Is  defined  over  the  frequency  range  150  -  900  Hz,  A2(j)  Is  defined  over  900  -  2200 
Hz,  and  A3(J)  over  2200  -  5000  Hz.  The  lei  ’the  of  these  three  intervals  are 
750,  1300,  and  2800,  respectively,  and  are  thus  In  the  approximate  ratio 
:  1  :  2  with  one  another.  Therefore,  In  order  to  weight  Al(j),  A2(j),  and 
A3(J)  fairly,  we  must  multiply  Al(j)  by  2,  A2(j)  by  1,  and  A3(j)  by  -j.  To 


A  pseudo-subroutine  is  an  Internal  subroutine  used  in  a  larger  program  and  is 
entered  and  exited  by  assigning  return  addresses  from  one  routine  to  the  next. 
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make  SCRAT(i)  duration-dependent,  the  term  DUR(j)  is  added.  The  "+1" 
appearing  in  the  third  term  of  SCRAT(i)  is  a  rounding  factor.  Figure  1 
illustrates  the  computation  of  SCRAT(l),  which  can  be  thought  of  as  a  crude 
approximation  to  the  power  spectrum  of  the  data  for  the  sustained  segment. 

Now  if 

SCRAT(i)  <!  SCRAT(i-l)  -  10  and  SCRAT(i)  <:  SCRAT(i+l)  -  10, 
then  segment  j  ■  INUSE(i)  is  said  to  be  a  local  minimum,  and  we  set 
SXT(J)  -  -  1. 

Alternatively,  if 

SCRAT(i)  !>  SCRAT(i-l)  +  10  and  SCRAT(i)  2:  SCRAT(i+l)  +  10, 
then  segment  j  -  INUSE(i)  is  said  to  be  a  local  maximum,  and  we  set 
SXT(j)  -  1. 

If  neither  set  of  Inequalities  is  satisfied,  it  may  still  be  possible  to  define 
a  local  extremum.  Suppose,  for  example,  that  there  exists  an  Integer  1  such 
that 

SCRAT(i)  5  SCRAT(i-l)  -  10 

and 

| SCRAT(k)  -  SCRAT(k+l)|<  10 

for  k  ■  i,  i+1,  ...  ,  i+n-1  for  some  Integer  n.  If  we  now  have  that 
SCRAT(i+n)  s  SCRAT  (i+n+1)  -  10, 

then  the  local  minimum  is  spread  over  the  segments  i,  i+1 . i+n.  In 

this  case,  the  segment  of  longest  duration  is  said  to  be  the  local  minimum. 

An  analogous  set  of  tests  leads  to  the  development  of  a  local  maximum  spread 
over  several  segments.  If  no  local  extremum  is  found  for  segment  j,  we  set 
SXT(j)  -  0. 

The  P-matrix  segments  are  examined  and  any  transitional  segment  containing  a 
local  minimum  or  maximum  with  a  duration  -  1  (l.e.,  a  10  msec  minimal  segment) 
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Is  renamed  "sustained."  Any  transitional  segment  containing  a  local  minimum  or 
maximum  with  a  duration  greater  than  1  (i.e.,  more  than  a  10  msec  minimal 
segment)  is  broken  up  Into  more  than  one  segment.  If  the  transitional  P-segment 
has  a  local  maximum,  then  the  Q-segment  within  the  P-segment  having  the  largest 
amplitude  In  the  frequency  range  150  -  900  Hz  Is  labeled  "sustained"  and  flagged 
as  a  local  maximum.  On  the  other  hand,  if  the  transitional  segment  was  a  local 
minimum,  then  the  Q-segment  within  the  P-segment  having  the  smallest  amplitude 
In  the  frequency  range  150  -  900  Hz  is  labeled  "sustained"  and  flagged  as  a  local 
minimum.  The  rest  of  the  transitional  segment  is  then  labeled  "transitional" 
with  no  local  extrema.  Note  that  if  the  minimal  segment  containing  the  extremum 
is  bounded  by  other  minimal  segments  within  the  larger  transitional  P-segment, 
the  larger  transitional  P-segment  is  broken  Into  three  segments:  a  transitional 
segment  preceding  the  minimal  sustained  segment,  the  minimal  sustained  segment 
and  a  transitional  segment  following  the  minimal  sustained  segment.* 


Justification  for  this  process  is  given  by  the  following  ([1],  p.  66): 

"Local  maxima  and  minima  of  the  amplitude  of  the  waveform  are  phonemically 
significant  (they  usually  represent  significant  vowels  and  consonants).  When 
a  phoneme  is  articulated  for  a  very  short  period  of  time,  it  has  a  rapidly 
varying  on-glide  and  off-glide.  When  closeness  indices  are  computed  for  this 
portion  of  the  sound,  one  may  find  that  no  two  adjacent  segments  satisfy  our 
definition  of  being  close.  Thus,  they  may  end  up  being  part  of  a  longer  trans¬ 
itional  segment.  A  special  effort  is  made  to  detect  and  recover  such  extrema 
by  searching  the  transitional  segments.  In  this  case,  the  original  transitional 
segment  is  replaced  by  two  or  more  segments,  the  local  extremum  being  a  10  ms 
sustained  segment." 
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Consideration  Is  also  given  to  one  other  type  of  segment  contained  within  a 
transitional  segment;  this  Is  a  very  short  burst  segment,  characterized  by  the 
following  three  conditions: 

(1)  Z3Q(i)  £  .'5  or  Z3Q(i)  +  Z2Q(i)  *  50, 
and 

(2)  A3Q(i)  2  AlQ(i), 
and 

(3)  AlQ(i)  <  6. 

Such  segments  are  made  Into  sustained  P-segments.  The  rest  of  the  P-segment  to 
which  they  belonged  becomes  either  one  or  two  transitional  P-segments,  depending 
on  where  the  short  burst  occurred  within  the  larger  segment:  If  the  short  burst 
occurred  after  the  first  minimal  segment  and  before  the  last,  then  the  short  burst 
will  be  a  sustained  segment  bounded  by  transitional  segments.  This  process  is 
repeated  until  there  are  no  more  transitional  segments  containing  a  local  extremum 
or  a  short  burst. 

Adjacent  transitional  segments  are  then  grouped  together  to  form  one  transitional 
segment;  the  P-matrix  is  recompacted;  and  then  for  each  transitional  P-segment 
of  duration  £  3,  the  closeness  values  (CLOVAL(i))  for  its  minimal  Q -segments  are 
recomputed,  where  CLOVAL(i)  is  a  measure  of  the  closeness  between  Q(i-l)  and 
Q(i+1).  A  loop  is  then  established  to  additively  increase  the  closeness  values 
of  all  minimal  segments  by  1  to  a  maximum  of  6  In  order  to  be  able  to  identify 
a  sustained  segment  out  of  this  larger  transitional  segment.  If  a  non-negative 
closeness  value  is  found,  the  larger  transitional  P-segment  of  duration  £  5  is 
resegmented  In  order  that  the  minimal  segments  having  non-negative  closeness 
values  can  be  made  into  a  sustained  segment. 

The  P-matrix  is  then  recompacted,  resorted,  and  the  local  minima  and  maxima  are 
recomputed.  The  new  P-matrix  is  then  examined  for  any  transitional  segments 
with  a  local  minimum  or  maximum.  If  one  is  found,  the  program  recurses  to 
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examine  all  transitional  segments  for  local  extrema,  short  bursts,  etc.  When 
the  P-matrix  contains  no  transitional  segments  with  local  extrema,  the  combining 
process  is  begun. 

3.5  COMBINING 

The  purpose  of  the  combining  process  is  to  group  together  acoustically  similar 
P-segments.  This  task  is  performed  by  first  treating  the  transitional  segments 
and  then  combining  sustained  segments  which  are  similar  in  the  sense  of  the 
definition  given  below.  In  general,  the  transitional  segments  are  considered  to 
be  null-segments  as  defined  in  [4],  pp.  337-342.  It  is  assumed  that  they  do 
not  contain  any  pertinent  information,  and  a  special  effort  is  made  to  reduce 
the  number  of  such  segments  as  much  as  possible.  This  is  done  by  extending  the 
sustained  segments  onto  the  transitional  segments  if  their  parameters  satisfy 
the  tests  given  below. 


For  any  given  P-segment,  say  P(i),  a  set  of  lower  bounds  INLIM(n)  (n  ■  1,  ...  ,6) 

and  upper  bounds  SUPLIM(n)  (n  ■  1 .  6)  are  computed  as  follows,  using  the 

average  amplitudes  Al(i),  A2(i),  and  A3(i)  and  average  zero-crossings  Zl(i), 

Z2(i),  and  Z3(i)  of  P(i): 


n 

INLIM(n) 

SUPLIM(n) 

1 

Al(i)  - j 

;a|u>+  ,) 

Al(i) 

+  1 

2 

Zl(i)  -( 

■) 

Zl(i) 

+ 

/Zl(i) 

V  10 

3 

A2(i)  -  | 

(^♦3) 

A2(i) 

+  l 

4 

Z2(i)  -  1 

Z2(i) 

+  l 

f Z2(i) 

l  10 

5 

A3(i)  - 

+ 0 

A3(i) 

+  l 

f A3(i) 
^  8 

6 

Z3(i)  - 

(^♦*) 

Z3(i) 

+l 

i  Z3(i) 

.  10 

+  1 
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3.5.1  Extending  Sustained  Segments  onto  Adjacent  Transitional  Segments 

For  every  transitional  P-segment  (say  P(j)),  except  for  the  first  P-segment,  the 
bounds  INLIM(n)  and  SUPLIM(n)  (n  ■  1,  ....  6)  are  computed  for  the  previous 
sustained  P-segment  (call  it  P(i)).  Beginning  with  the  first  minimal  Q-segment 
of  P(j)  and  continuing  until  the  ending  minimal  Q-segment,  the  following  tests 
are  performed  on  the  parameters  A1Q(  ),  Z1Q(  ),  A20(  ),  Z2Q(  ),  A3Q(  ),  and  Z3Q(  ) 
INLIM(l)  <:  A1Q(  )  <  SUPLIM(l) 

INLIM(2)  £  Z1Q(  )  £  SUPLIM(2) 

INLIM(3)  S  A2Q(  )  5  SUPLIM(3) 

INLIM(4)  5  Z2Q(  )  5  SUPLIM(4) 

INLIM(5)  S  A3Q(  )  5  SUPLIM(5) 

INLIM(6)  S  Z3Q(  )  <  SUPLIM(6) 


If  more  than  one  parameter  is  outside  the  bounds,  P(i)  is  extended  to  include 
all  Q-segments  of  P(j)  up  to  the  n-segment  containing  more  than  one  parameter 
outside  the  bounds  by  using  subroutine  CREAT4  (described  below).  P(j)  then 
becomes  a  transitional  segment  beginning  with  the  first  O-segment  that  had  more 
than  one  parameter  not  falling  within  the  bounds  computed  for  P(i). 

The  same  procedure  is  now  allied  to  the  sustained  P-segment  following  P(j): 
for  the  above  transitional  P(j),  the  boui  s  INLIM(n)  and  SUPLIM(n)  (n  *  1,  ...,  6) 
are  computed  for  the  following  sustained  P-segment  (call  it  P(k)).  Beginning 
with  the  last  minimal  Q-segment  of  P(j)  and  continuing  backward  to  the  first 
minimal  Q-segment,  the  parameters  A1Q(  ),  Z10(  ),  A2Q(  ),  Z2Q(  ),  A30(  ),  and 
Z3Q(  )  are  tested  to  see  if  they  lie  within  the  corresponding  bounds  of  P(k). 

If  more  than  one  parameter  is  outside  the  bounds,  P(k)  is  extended  backwards  to 
include  all  minimal  Q-segments  of  P(.1)  back  to  the  0-segment  containing  more 
than  one  parameter  outside  the  bounds.  P(j)  then  becomes  a  transitional  segment 
.ending  with  the  first  Q-segment  that  has  more  than  one  parameter  not  falling 
within  the  bounds  computed  for  P(k) . 


( 
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The  P-matrlx  is  then  recompacted,  sorted,  and  local  minima  and  maxima  for  the 
new  segments  are  identified  using  pseudo-subroutine  MINMAX. 

In  order  to  combine  P-segments,  we  determine  whether  or  not  two  P-segmentt  are 
similar  by  using  function  PROXIM  (see  Section  3.6.2).  The  parameters  used  in 
the  closeness  index  computation  are  now  the  average  parameters  for  the  P-segments, 
viz.,  Al(i) ,  Zl(i) ,  A2(i),  Z2(i),  A3(i),  and  Z3(i). 


For  every  P-segment  except: 

(1)  those  containing  a  local  extremum  and  consisting  of  one  minimal 
Q-segment 

and 

(2)  those  containing  no  local  extremum  and  consisting  of  either  one,  two, 
three  or  four  minimal  Q-segments, 

the  closeness  between  P(i)  and  P(i+1)  is  computed  by  function  PROXIM.  The 
result  is  stored  in  CLf(i+l).  For  cases  (1)  and  (2)  above,  however,  the  closeness 
value  is  computed  between  P(i+1)  and  a  "pseudo-segment"  based  upon  P(i). 

This  "pseudo-segment"  is  created  by  subroutine  CREAT4  and  is  computed  in  row 
number  SIZEPM,  the  last  row  of  the  P-matrix  used  as  a  "scratch"  area.  CREAT4 
is  called  after  setting 

SEND  -  SND(i) ,  SBEG  -  SND(i)  -  5  and  LENSEG  =■  5. 


CREAT4  then  computes 

beta  .  urnra. -JL 


which  in  this  case  is  ■  1,  and  then  calculates  the  following  parameters  for 
row  m  -  SIZEPM: 

AlMN(m)  ,  Al(m)  ,  AlMX(m) 

ZlMN(m)  ,  Zl(m)  ,  ZlMX(m) 

A2MN(m)  ,  A2(m)  ,  A2MX(m) 

Z2MN(m)  ,  Z2(m)  ,  Z2MX(m) 

A3MN(m)  ,  Z3(m)  ,  A3MX(m) 

Z3MN(m)  ,  Z3(m)  ,  Z3MX(m) 
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In  the  case  In  which  the  P-segment  consists  of  one  minimal  0-segment,  the  above 
parameters  are  computed  from  the  three  minimal  Q-segments  immediately  preceding 
the  P-segment.  If  the  P-segment  consists  of  two  Q-segments,  the  parameters  are 
computed  from  the  firs*-  Q-segment  of  the  P-segment  plus  the  preceding  two 
Q-segments.  For  the  ;ase  of  three  Q-segments,  the  calculations  are  based  upon 
the  first  two  Q-segments  of  the  P-segment  and  the  preceding  Q-segment.  Finally, 
for  four  0-segments,  the  first  three  Q-segments  of  the  P-segment  are  used. 

After  CL0(i+l)  is  computed,  a  check  is  made  to  see  if  the  duration  of  the 
combination  of  P(i)  and  P(i+1)  is  greater  than  300  msec,  and  if  so,  we  set 
CLO(i+l)  -  -  30.  Otherwise,  CL0(i+l)  is  left  equal  to  its  previously  computed 
value . 

3.5.2  Combining  Segments 

Let  i  -  2.  An  attempt  is  now  made  to  combine  P(i)  with  the  P-segment  immediately 
preceding  it,  which  might  not  be  P(i-l)  if  P(i-l)has  previously  been  combined 
with  some  other  P-segment.  When  we  have  found  this  immediately  preceding  P-segment, 
we  let  II  =  the  P-matrix  row  number  of  this  segment.  Let  12  *  i  and  13  ■  i+1. 

If  i  =  SIZEP,  let  13  *  0. 


Because  of  the  complexity  involved  in  the  combining  process,  the  heuristics  used 
to  decide  whether  to  combine  P(I2)  with  P(I1)  ate  illustrated  by  the  flow  chart 
in  Figure  2.  It  is  important  to  be  able  to  interpret  these  heuristics  in  light 
of  various  mathematical  techniques. 

Consider,  for  example,  the  inequality 

Ul(Il)  +  A2(I1)  -  A1(I2)  -  A2(I2)|>  17  -  PUR(I1)  -t  P<T.R.(I2) 

Rewrite  this  in  the  form 
A1(I1)  -  A1(I2) 

(DUR(Il)  +  DUR(I2)) 

_ 34 _ 

~  (DUR(Il)  +  DUR(I2)) 


A2(I1)  -  A2(I2) 


■j  (DUR(Il)  +  DUR(I2) ) 


1  (DUR(Il)  +  DUR(I2)) 
7 


-  1 
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The  term 

A1(I1)  -  A1(I2) 
j  (DUR(Il)  -  DUR(I2) ) 

measures  the  rate  of  change  of  Al(l)  as  we  proceed  from  segment  P(I1)  to  segment 
P(I2).  A  similar  statement  applies  to  the  term 

_ v2(Il)  -  A2(I2) 

\  (D’JR(Il)  +  DUR(I2) ) 


To  identify  the  significance  of  the  right-hand  side  of  the  inequality,  we  proceed 
*8  follows:  considei  Al(i)  and  A2(l)  as  discrete  random  variables  selected  from 
a  uniform  distribution.  Each  is  assume.*  to  lie  in  the  range  from  4  to  63 
inclusive,  since  this  is  the  range  used  in  CL0S1.  The  expected  value  of  either 
Al(i)  or  A2(i)  in  this  range  it  then 
4  +  • • •  +63 


63-4+1 


33.5  ■  34  (rounded). 


Each  term 

_ 34 _ 

|  (DUR(Il)  +  DUR(I2)) 

thus  represents  an  average  rate  of  change  that  can  be  expected  in  either  Al(l) 
or  A2(i).  The  inequality  can  now  be  interpreted  as  follows:  if  P(I1)  and  P(I2) 
are  opposite  extrema  and  if  the  sum  of  the  rate  of  change  in  Al(l)  and  that  of 
A2(i)  is  greater  than  or  equal  to  25Z  of  the  sum  of  the  average  rates  of  change 
that  can  be  expected  in  Al(i)  and  A2(i),  then  P(I1)  and  P(I2)  are  not  combined. 


If  P(I1)  is  not  combined  with  P(I2),  then  the  processing  of  the  P-segments 
continues  until  P(SIZEP)  has  been  tested. 


If  P(I2)  is  to  be  combined  with  P(I1),  then  pseudo-subroutine  CREAT4  is  used  to 
construe1:  a  new  ?(I1)  where  the  beginning  of  P(I1)  is  defined  by  SBG(Il)  +  1 
and  the  end  by  SND(I2).  The  LENSEG  (length  of  P(I1))  is  given  by  TOJR(ll)  +  DUR(I2) 
and  TYPE  (II)  -  SUST  unless  both  P(I1)  and  P(I2)  were  "TR,"  in  which  case 
TYPE(Il)  -  TR. 
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12  Is  then  made  available  for  reassignment  as  a  row  number  by  being  released 
from  the  INUSE  table.  The  P-matrlx  Is  recompacted.  The  P-segment  immediately 
preceding  P(I1)  will  be  called  P(I0).  P(I0)  Is  examined  and  If  P(I0)  Is  not  a 

local  minimum  or  maximum  and  DUR(IO)  Is  <  5  or  If  P(I0)  Is  a  local  minimum  or 
maximum  and  DUR(IO)  -  1,  then  a  pseudo-segment  Is  created  by  CREAT4  in  P(200) 
such  that  LENSEG  ■  5,  SEND  ■  SND(IO)  and  SBEG  ■  SEND  -5.  A  new  closeness 
value,  CL0(I1\  Is  then  computed  between  P(I0)  (or  P(200))  and  P(I1).  If 
the  duration  of  P(I0)  plus  that  of  P (II)  Is  greater  than  300  msec  then  CLO(Il)  ■ 
-30.  The  same  procedure  Is  followed  for  P(I1)  and  P (1 3) .  The  pseudo-subroutine 
MINMAX  Is  called  to  find  the  new  extrema,  1  Is  reset  equal  to  2,  and  the  combining 
process  starts  all  over  again. 

3.5.3  The  Creation  of  Beginning  and  Ending  Segments 

Recall  that  earlier  the  "noisy"  segments  at  the  beginning  and  end  of  the  sample 
were  suppressed.  An  attempt  Is  now  made  to  create  a  beginning  segment  and  an 
ending  segment  based  upon  the  suppressed  data  still  resident  In  the  O-matrlx. 
Again,  because  of  the  complexity  of  this  process,  the  logic  Is  depicted  in  the 
flow  chart  of  Figure  3. 


3. 5. A  Further  Suppression  of  Transitional  Segments 

For  all  transitional  P(l)  for  1  -  1,  ....  SIZEP,  we  calculate: 

11  -  TNUSE(I-l) 

12  -  INUSE(i) 

13  -  INUSE(i+l) 

INUSE(i)  -  0 

dur (ii)  -  dur(ii)  + 

DUR(I3)  -  DUR( 13)  +  DUR^12^ 

SBG(I3)  -  SBG(I3)  -  Pl&p2) 

SND(Il)  -  SND(Il)  +  DU-R412' 
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This  results  in  extending  each  transitional  segment  halfway  onto  the  preceding 
and  following  sustained  segments.  Of  course,  if  a  transitional  segment  happens 
to  consist  of  an  odd  number  of  minimal  segments,  the  fate  of  the  minimal  segment 
in  the  center  is  questionable.* 


The  P-matrix  is  then  compacted,  and  an  entirely  new  set  of  transitional  segments 
is  artificially  created  an  follows:  for  all  values  of  i  between  2  and  SIZEP  -  1, 
we  isolate  all  P(i)  for  which  either  (1)  DUR(i)  s  3  or  (2)  P(i)  is  "SUST." 

For  each  i,  a  variable  TRINDX  (which  will  represent  a  count  of  parameters  with 
transitional  characteristics)  is  initialized  to  zero.  For  J  -  5,  8,  11,  14,  17, 
and  20  (the  column  numbers  of  the  average  parameter  values  Al,  Zl,  A2,  Z2,  A3, 
and  Z3  of  the  P-matrix),  we  proceed  as  follows. 

If 

|  P (12, J )  -  P(Il,j)  |  <  max  (l,  ) 

then  we  set  TEMPI  *  0.  If,  however, 

|  P(I2 , j )  -  P(Il,j)  |  *  max  (l,  —I2,1)4q  P(I1J-'-) 

then  we  set 

TEMPI  -  P(I2, j)  -  P(Il,j). 


Similarly,  if 

|  P(I2, j)  -  P(I3, j)  (  <  max  (l,  P(* 2.,lLt-J(l3 iM  ) 
then  we  set  TEMP 2  *  0.  If,  however, 

|  P  ( 1 2 ,  j  )  -  P(I3,j)  |  >  max  (l,  ) 

then  we  set 

TEMP 2  -  P  ,12 ,J)  -  P(I3, j) . 


In  the  SDC  version  of  the  program,  if  a  transitional  segment  is  composed  of  an 
odd  number  r  of  minimal  segments,  then  segments  are  extended  onto  the  preced¬ 
ing  sustained  segments,  and  -Sil  segments  are  extended  onto  the  following 
sustained  segment. 
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Now  If 

TEMPI  *  TEMP2  S  0 
then  we  put* 

TRINDX  -  TRINDX  +1 


At  the  completion  of  the  J-loop  above,  TRINDX  will  have  a  value  between  0  and  6. 
P(I2)  will  be  labeled  TR  if  either  of  the  following  tvo  tests  is  satisfied: 

(1)  If  P(I2)  is  not  an  extremum  and  2  *  DUR(I2)  -  TRINDX  *  2, 

(2)  If  P(I2)  is  an  extremum  and  DUR(I2)  <  3  and  TRINDX  *  5. 

We  now  recompute  closeness  values  for  i  ■  2,  ••*,  SIZEP  so  that 

11  -  INUSE(i-l) 

12  -  INUSE(i) 

and  CL0(I2)  -  closeness  between  P(I1)  and  P(I2). 

If  the  last  two  logical  P-segments  have  a  duration  greater  than  300  msec, we  put 
CLO (SIZEP)  -  -30. 

For  each  transitional  P(i)  for  i  -  2,  ••*,  SIZEP,  we  set 

11  .  INUSE(i-l) 

12  -  INUSE (i) 

13  -  INUSE (i+1) 

If  i  -  SIZEP,  then  put  13-0.  If 

(a)  CLO(Il)  <  CLO(I2)  and  CL0(I2)  >  -  8 

and  if  either  P(I3)  is  sustained  or  if  it  is  transitional  with  II  -  0, 
or  if: 

(b)  -  8  <  CL0(I3)  £  CL0(I2) 


If  the  slope  from  P('l)  to  P(I2)  is  opposite  the  slope  from  P(I2)  to 
P (7 3) ,  we  say  P(I2)  has  transitional  characteristics,  and  TRINDX  is 
incremented  by  1. 
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and  if  PCI)  is  transitional  and  P(I3)  is  sustained,  then  we  set  INUSE(I2)  *  0 

and  calculate  the  following  new  average  parameters: 

...  A1(I2)-DUR(I2)  +  A1(I3)-DUR(I3) 

aiu-J;  -  DUR(I2)  +  DUR(I3) 

,T .  Z1(I2)-DUR(I2)  +  Z1,(I3) •  DUR(I3) 

U '  DUR(I2)  +  DUR(I3) 

where  A2(I3)  and  A3(I3)  are  calculated  exactly  as  A1(I3),  but  with  A2(  )  replacing 
Al(  ),  etc.,  and  Z2(I3)  and  Z3(I3)  are  calc  lated  similarly.  These  new  average 
values  are  the  old  values  appropriately  weighted  to  reflect  their  durations 
in  the  P-segraents. 


Also  calculate: 

DUR(I3) 

and 


SBG(I3) 


DUR(I2)  +  DUR(I3) 
SBG(I3)  -  DUR(I2) . 


A  new  closeness  value  CL0(I3)  is  now  computed  between  P(ll)  and  P(I3),  since 
P(I2)  has  been  combined  with  P(I3).  However,  if 
DUR(Il)  +  DUR(I3)  >  30, 
then  ve  set  CL0(I3)  *  -  30. 


If  neither  (a)  nor  (b)  above  in  satisfied, but: 

(c)  CL0(I3)  5  CL0(I2)  and  CL0(I2)  >  -  8, 
and  if  P(I1)  is  a  sustained  segment,  then  we  set  INUSE(I2)  »  0  and  calculate 
values  of  Al(Il),  Z1(T1),  A2(Il),  Z2(I1),  A3(I1),  and  Z3(I1)  in  the  same 
manner  as  above  but  with  II  replacing  13.  Also,  we  compute 
DUR(Il)  -  DUR(Il)  +  DUR(I2) 

and  a  new  closeness  value  CL0(I3)  between  P(I1)  and  P(I3).  However,  if 
DUR(Il)  +  DUR(I3)  >  30, 

then  we  set  CL0(I3)  •  -  30.  If  neither  (a)  nor  (b)  nor  (c)  is  satisfied,  we  do 
not  conbihe  P(I2)  with  either  P(I1)  or  P(I3).  If  any  P-segment  has  been  combined, 
then  the  P-matrix  is  recompacted  and  we  continue  the  above  procedure  until  no 
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more  P-segments  can  be  combined.  New  relative  extrema  are  computed  using  pseudo¬ 
subroutine  MINMAX;  pseudo-subroutine  REORD  (Section  3.8.5)  is  used  to  reorder 
the  P-matrix,  SIZEP  ■  SIZEP  +1  and  segmentation  is  complete. 

3.6  CLOSENESS  FUNCTIONS 


3.6.1  Function  CLOS1 

Function  CL0S1  is  a  basic  routine  used  to  compute  closeness  values  (defined 
below)  between  rows  of  the  Q-matrix.  The  input  parameters  to  CL0S1  are: 

SEGNBl  ■  a  row  number  of  one  particular  row  of  the  Q-matrix 

SEGNB2  ■  a  row  number  of  another  row  of  the  Q-matrix  to  which  row  SEGNBl 

is  to  be  compared 

ALFA  ■  a  "flag"  of  the  most  variable  parameter  as  determined  by  function 
VARFUN;  ALFA  -  1  if  A1Q(  )  is  most  variable 
ALFA  -  2  if  Z10(  )  is  most  variable 

ALFA  ■  3  if  A2Q(  )  is  most  variable 

ALFA  ■  4  if  Z20(  )  is  most  variable 

ALFA  -  5  if  A3Q(  )  is  most  variable 

ALFA  «  6  if  Z3Q(  )  is  most  variable 

ALFA  -  0  if  none  of  A10(  ),  Z10(  ),  A20(  ),  Z20(  ),  A30(  ), 
or  Z3Q(  )  is  most  variable. 

TYPC  -  1  if  CLOS1  is  to  compute  a  closeness  value  between  rows  SEGNBl  and 
SEGNB2,  and 

■  2  if  CL0S1  is  to  compute  closeness  values  for  all  rows  in  the 
Q-matrix  beginning  with  SEGNBl  and  ending  with  SIZEQ.  CL0S1 
computes  a  measure  of  closeness  between  row  i-1  and  row  i+1 
and  stores  the  result  in  row  i. 


In  the  following  discussion,  the  characters  "SI"  and  "S2"  will  be  used  in  place 
of  "SEGNBl"  and  "SEGNB2. " 


( 


it*-**- 


4  December  1970 


37 


System  Development  Corporation 
TM-4652/200/00 


CL0S1  begins  by  isolating  all  segments  which  characterize  a  fricative  type  "S." 
These  segments  are  defined  by  either  one  of  the  following  two  properties: 

(1.)  A3Q(S1)  2  A1Q(S1) 

and 

A3Q(S2)  2:  A1Q(S2) 
and 

Z3Q(S1)  ^  60 
and 

Z3Q(S2)  2  60 
or 

(2)  Z3Q(S1)  2  45 

and 

Z3Q(S2)  2  45 
and 

A1Q(S1)  *  6 
and 

A1Q(S2)  ^  6. 


In  the  previous  version  of  the  program  (containing  MAIN) ,  the  fricative  test  (2) 
above  contained  the  inequalities 

A1Q(S1)  <  6  and  A1Q(S2)  <  6, 

opposite  to  those  above.  The  latter  two  inequalities  appear  to  give  a  more 
reasonable  characterization  of  a  fricative  type  "S,"  since  such  a  fricative  is 
typically  of  low  A1  amplitude  and  high  Z3  frequency.  The  reason  for  the  change 
is  unknown. 


c 
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If  a  fricative  type  "S"  has  been  detected,  we  set  CL0S1  *  8  and  return  to  the 
segmentation  algorithm.  Otherwise,  we  proceed  by  establishing  the  following 
table  of  parameters. 


Table  7.  Parameters  for  CL0S1 


Parameter 

Parameter  Number 

(j) 

LIM(j) 

TEMLIM(j) 

RATLIM(j) 

WEIGHT  (j  ) 

A1Q(  ) 

1 

4 

0 

0.6 

15.0 

Z1Q(  ) 

2 

2 

2 

0.3 

30.0 

A2Q(  ) 

3 

4 

0 

0.6 

15.0 

Z2Q(  ) 

4 

4 

14 

0.3 

30.0 

A3Q(  ) 

5 

4 

0 

0.6 

15.0 

Z3Q(  ) 

6 

10 

30 

0.5 

30.0 

CLOS1  is  initialized  to  zero,  a  logical  flag  (called  PRFLAG)  is  initially  set 
to  FALSE,  indicating  that  the  segments  are  similar.  Let  j  -  1  and  compute 

(1)  TEMPI  -  max  {A1Q(S1),  LIM(j)} , 
and 

(2)  TEMP2  -  max  {A1Q(S2),  LIM(j)}. 

If  ALFA  -  j,  then  FACT  ■  1.25;  otherwise,  FACT  -  1,0.  Now  if 
|  TEMPI  -  TEMP 2  |  <1  LIM(j), 

we  put 

CL0S1  -  CL0S1  +2.0. 


If,  however, 

|  TEMPI  -  TEMP 2  I  >  LIM(j), 


then  we  compute 
(3)  RATIO 


TEMPI  -  TEMP2  I 
4  •  /  TEMPI  +  TEMP 2  . 


4  December  1970 


39 


'typ-.em  Development  Corporation 
TM-4652/200/00 


If 

max  {TEMPI,  TEMP2}  5  TEMLIM(j), 

we  set 

RATIO  -  (.7)- RATIO; 
otherwise  leave  RATIO  alone.  If 
RATIO  >  RATLIM(j), 

set  PRFLAG  ■  TRUE  (meaning  that  the  segments  are  nonsimilar) .  In  either 
event,  whether 

RATIO  >  RATLIM(j)  or  RATIO  S  RATLIM(j), 

we  compute 

CLOS1  -  CLOS1  +^|  -  FACT- WEIGHT  (J) •  RATIO  j  . 

Now  step  j  *  j  +  1,  and  return  to  equations  (1)  and  (2)  above  with  Z1Q(S1) 
replacing  A1Q(S1)  and  Z1Q(S2)  replacing  A1Q(S2).  CL0S1  Is  again  updated, 
j  ■  j  +  1  again,  and  w<  continue  with  A2Q(S1)  and  A2Q(S2),  etc.,  until  we 
reach  j  >  6. 

If  PRFLAG  Indicates  that  the  segments  are  nonsimllar,  and  if 
CLOS1  >  -4, 

we  set 

CL0S1  -  -4. 

Finally,  two  segments  are  considered  "close"  If  the  final  value  of  CLOS1  Is 
>  0. 

No  Interpretation  currently  exists  for  either  the  choice  of  parameter  values 
or  the  particular  equations  used  in  this  routine.  In  an  attempt  to  provide  an 
understanding,  we  offer  the  following: 

To  begin  with,  consider  the  column  of  values  RATLIM(j)  for  j  ■  1 . 6.  These 

values  serve  as  threshold  parameters  for  the  function 

ratio  -  I  ran  -  TEMP2  I 

4  •  /  TEMPI  +  TEMP2  . 
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It  cm  be  shown  that  the  values  of  RATLIM(j)  are  the  averages  of  RATIO  when 
TEMPI  and  TEMP2  are  varied  over  the  six  ranges  shown  in  Table  8. 

Table  8.  Comparison  of  Averages  of  RATIO  and  Values  of  RATLIM(J) 


Parameter 

Range  of 
TEMPI 

Range  of 

TEMP2 

Average  of 

RATIO 

RATLIM(j) 

A1Q(  ) 

4-63 

4-63 

.63 

.6 

Z1Q(  ) 

3-18 

3-13 

.30 

.3 

A2Q(  ) 

4-63 

4-63 

.63 

.6 

Z2Q(  ) 

18-44 

18-44 

.29 

.3 

A3Q(  ) 

4-63 

4-63 

.63 

.6 

Z3Q(  ) 

44-100 

44-100 

.40 

.5 

Thus,  for  each  of  the  six  variables,  RATIO  compared  with  its  average  value. 


The  function  used  to  generate  the  average  for  A1Q(  ),  A..'Q(  )  and  A3Q(  )  is 


1  _ 

(63-4  +  l)2 


63 


-  E 

TEMP 1-4 


63 

E 

TEMP2-A 


I  TEMPI  -  TEMP2  I 
4  •  /  TEMPI  +  TEMP2  . 


The  averages  for  Z1Q(  ),  Z2Q(  )  and  Z2Q(  )  are  generated  similarly.  Note  that 
the  ranges 


3-18,  18-44,  44-100 

are  the  frequency  ranges  of  the  three  front-end  hardware  filters. 


The  first  five  values  of  the  average  of  RATIO  agree  well  with  RATLIM(j); 
however,  we  have  disagreement  with  FATLIM(6).  We  assert  that  the  correct 
value  for  RATLIM(6)  is  0.4  and  that  certain  values  of  WEIGHT(j)  are  incorrect 
as  given.  To  show  this,  we  note  that  the  expression  for  RATIO  appears  in  [1] 
and  in  one  version  of  the  program  as 

«)  RATIO  ■  IJgg-1.  -  TEKF2  1 

/TEMPI  +  TEMP 2  . 
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The  given  associated  values  for  RATLIM(j),  WEIGHT(j)  and  the  computed  averages 
of  RATIO  are: 


Table  9.  Data  for  Modified  RATIO  Function 


J 

RaTLIM ( j ) 

WEIGHT (j) 

Avera 

RA 

ge  of 

no 

(computed) 

(rounded) 

1 

2.5 

4.0 

2.53 

2.5 

2 

1.2 

7.5 

1.19 

1.2 

3 

2.5 

4.0 

2.53 

2.5 

4 

1.2 

7.5 

1.15 

1.2 

5 

2.5 

4.0 

2.53 

2.5 

6 

2.0 

7.5 

1.59 

1.6 

Now,  to  calculate  CLOS1,  we  must  compute  the  expression 
~  -  FACT- WEIGHT (j)-  RATIO 

alternately  for  amplitudes  and  zero-crossings  and  add  the  results  together. 
The  numbers  WEIGHT (j)  ensure  that  the  amplitudes  and  zero-crossings  are 
weighted  fairly;  they  may  be  justified  by  observi;.3  that  the  sum  of  the 
average  values  for  the  amplitudes  Is 
2.5  +  2.5  +  2.5  -  7.5, 
and  for  zero-crossings, 

1.2  +  1.2  +  1.6  *  4.0. 

The  assignment  of  proper  weights  is  now  obvious.  The  appropriate  value  of 
RATLIM(6)  for  RATIO  in  (4)  is 
RATLIM(6)  -  1.6. 
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A  corrected  set  of  values  of  RATLIM(j)  and  WEIGHT(j)  to  be  used  in  conjunction 
with  RATIO  in  (3)  can  now  be  given  as  follows: 

Table  10.  Corrected  Values  of  RATLIM(j)  and  WEIGHT(j) 


J 

RATLIM(I) 

WEIGHT (J) 

1 

.625 

16.0 

2 

.300 

30.0 

J 

.625 

16.0 

4 

.300 

30.0 

5 

.625 

16.0 

6 

.400 

30.0 

There  is  a  further  justification  for  the  selection  of  RATLIM(6)  ■  .4.  If  we 
compute  the  percentage  of  cases  in  which 
RATIO  *  RATLIM(j), 

and  thus  call  the  segments  close,  we  get  the  following  table: 

Table  11.  Analysis  of  RATLIM(j)  Values 


Range  of 
TEMPI 

Range  of 

TEMP  2 

RATLIM(j) 

X 

4-63 

4-63 

.625 

54 

3-18 

3-18 

.300 

55 

4-63 

4-63 

.625 

54 

18-44 

18-44 

.300 

57 

4-63 

4-63 

.625 

54 

44-100 

44-100 

.400 

55 

44-100 

44-100 

.500 

66 
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In  other  words,  using  RATLIM(6)  ■  .5  as  given  results  In  66X  of  the  values  of 
RATIO  being  less  than  or  equal  to  RATLIM(6).  The  more  reasonable  value 
appears  again  to  be  RATLIM(6)  ■  .4. 


We  will  now  provide  justification  for  the  use  of  the  formula 
CL0S1  -  CL0S1  +(  ~  -  FACT-  WEIGHT (j  )•  RATIO j  . 

For  each  Individual  parameter  (A1Q(  ),  Z1Q(  ),  etc.)*  we  calculate 


(5) 


y  -  FACT- WEIGHT(j)- RATIO. 


For  amplitudes,  WEIGHT (J)  •  16.0,  and  so  the  expression  in  (5)  (assuming 
FACT  ■  1  -  0)  Is  >  0  If 


RATIO  < 


5_ 

32 


-  (.25) (.625) 


2 • WEIGHT( j ) 

In  order  for  corresponding  amplitudes  for  two  segments  to  be  considered  close, 
RATIO  must  therefore  be  within  25%  of  RATLIM(J). 


For  zero-crossings,  WEIGHT(J)  ■  30.0.  The  average  value  of  RATLIM(j)  for 

all  three  zero-crossing  ranges  is 

.300  +  .300  +  .400  -  4 
3  J 

The  expression  in  (5)  (assuming  FACT  ■  1.0)  is  >  0  if 


RATIO  < 


-  (.25) (.  333 —  )  - 


2-  WEIGHT ( j  )  60 

In  order  for  corresponding  zero-crossings  for  two  segments  to  be  considered 
close,  RATIO  must  therefore  be  within  25%  of  the  average  value  of  RATI.IM(J) 
for  all  three  zero-crossing  ranges. 


3.6.2  Function  PROXIM 

Function  PROXIM  computes  closeness  values  between  segments  of  the  P-matrix  in 
a  similar  manner  as  CLOS1  calculates  closeness  values  between  segments  of  the 
Q-matrix.  The  input  parameters  to  PROXIM  are: 

SEGNB1  -  a  row  number  of  one  particular  row  of  the  P-matrix, 
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< 


( 


SEGNB2  ■  a  row  number  of  another  row  of  the  P-matrix  to  which  row 
SEGNB1  is  to  be  compared, 

ALFA  -  a  ' flag"  of  the  most  variable  parameter  as  determined  by  function 
VARFUN;  ALFA  -  1  if  Al(  )  is  most  variable 

■  2  if  Zl(  )  is  most  variable 

■  3  if  A2(  )  is  most  variable 

■  4  if  Z2(  )  is  most  variable 

■  5  if  A3(  )  is  most  variable 

■  6  if  Z3(  )  is  most  variable 

-  0  if  none  of  Al(  ),  Zl(  ),  A2(  ),  Z2(  ),  A3(  ) 
or  Z3(  )  is  most  variable. 

PROXIM  performs  exactly  the  same  operations  as  CL0S1,  but  bases  the  computations 
on  the  average  parameters  Al(  ),  Zl(  ),  A2(  ),  Z2(  ),  A3(  )  and  Z3(  ),  and  uses 
a  different  set  of  weights  WEIGHT (J )  and  a  different  set  of  values  of  RATLIM(j).  ( 

Some  confusion  exists  as  to  the  correct  values  for  WEIGHT(J)  and  RATLIM(J). 

At  least  three  different  sets  of  values  are  in  existence:  one  set  is  given  in 
(1],  another  appears  in  an  early  version  of  the  program  (that  in  which  MAIN  is 
used  for  recording),  and  a  third  is  given  in  a  later  version  (that  in  which  RECORD 
replaces  MAIN).  Table  12  lists  these  three  sets. 


Table  12.  Comparison  of  Various  WEIGHT(j)  and  RATLIM(j)  Values  for  PROXIM 


[1] 

Version  with  MAIN 

Version  with  RECORD 

WEIGHT  (j) 

RATLIM(J) 

WEIGHT (j ) 

RATLIM(J) 

WEIGHT  (J) 

RATLIM(j) 

4.0 

2.0 

4.0 

2.0 

15.0 

.5 

6.0 

1.0 

6.0 

1.0 

25.0 

.25 

4.0 

2.0 

4.0 

2.5 

15.0 

.5 

5.0 

1.0 

6.0 

..0 

20.0 

.25 

4.0 

2.0 

4.0 

2.5 

15.0 

.5 

5.0 

1.6 

6.0 

1.6 

20.0 

.4 
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The  RATIO  function  used  in  the  version  of  the  program  with  MAIN  is 

RATIO  -  JggL  -  TPg2J_, 

/TEMPI  +  TEMP 2 

and  in  the  version  with  RECORD, 

RATH)  -  I  TEMPI  -  TEMP 2  1  . 

4 /TEMPI  +  TEMP 2 

The  SDC  version  of  the  segmentation  program  was  checked  out  by  taking  Q-matrices 
which  were  generated  by  the  Stanford  program  ?r.d  comparing  the  P-matriceB 
created  by  the  two  programs.  The  results  agreed  with  the  Stanford  version 
with  MAIN,  but  the  results  did  not  agree  with  the  version  with  RECORD.  The 
sets  of  values  of  WEIGHT (j )  and  RATLIM(j)  were  then  modified  by  multiplying 
the  values  of  WEIGHT (j )  in  the  MAIN  version  by  4.0  and  dividing  the  related 
values  of  RATLIM(j)  by  4.0.  The  SDC  version  with  these  modified  values  did 
check  out  with  the  Stanford  RECORD  version.  These  values  are: 


WEIGHT/ j ) 

RATLIM(j) 

16.0 

.5 

24.0 

.25 

16.0 

.6 

24.0 

.25 

16.0 

.6 

24.0 

.4 

Note  that  the  RECORD  values  for  RATLIM(j)  are  1/4  of  the  RATLIM(J)  given  by 
[1],  rather  than  1/4  of  the  MAIN  RATLIM(J)  values.  The  RECORD  values  for 
WEIGHT(J)  appear  to  have  been  generated  in  a  similar  manner  by  multiplying 
WEIGHT(j)  in  [1]  by  4.0,  but  there  is  not  an  exact  correlation.  Since  the 
SDC  version  with  the  modified  values  agrees  with  the  Stanford  RECORD  version, 


*TEMP1  and  TEMP2  are  defined  above  in  the  section  on  CL0S1  and  have  the  aame 
meaning  in  PROXIM. 
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we  assume  that  the  values  of  WEIGHT  (J)  and  RATLIM(j)  in  the  Stanford  RECORD 
version  were  modified  before  test  results  were  generated  for  SDC  at  Stanford. 

However,  based  upon  the  previous  analysis  of  the  values  of  WEIGHT(j)  and 
RATLIM(J)  In  function  CL0S1,  it  Is  felt  that  none  of  the  previous  sets  of 
WEIGHT  (J)  and  RATLIM(J)  for  PR0X1M  are  correct.  Ir.  fact,  It  appears  that  the 
RATLIM(J)  values  for  PROXIM  were  originally  intended  to  be  80X  of  those  for 
CL0S1.  Based  upon  this  assumption,  we  assert  that  the  correct  set  of 
RATLIM(j)  and  WEIGHT (j)  is: 


RATLIM(j) 

WEIGHT (J) 

.50 

12.8 

.24 

24.0 

.50 

12.8 

.24 

24.0 

.50 

12.8 

.32 

24.0 

The  use  of  this  last  set  of  values  in  PROXIM  results  in  e  relaxation  of  the 
closeness  criteria  for  P-segments  as  originally  intended  in  [1].  Indeed,  in 
[1],  p.  67,  we  are  told  "...  as  we  are  dealing  with  average  parameters  (i.e., 
comparing  correspond.!!’®  values  of  Al(  ),  Zl(  ),  A2  (  ),  Z2  (  ),  A3(  )  and  Z 3 (  ) 
of  two  P-segments),  the  weights  are  decreased  to  make  the  procedure  less 
sensitive  to  smaller  variations."  It  is  instructive  to  quanHfy  ti.is  last 
statement  and  see  how  much  less  sensitive  the  procedure  becomes  with  the 
decreased  weights.  For  corresponding  average  amplitudes  of  two  P-segments  to 
be  considered  close,  we  require  that 

|  -  FACT- WEIGHT  (j)- RATiO  >  0. 
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Take  FACT  ■  1.0.  Then  since  WEIGHT(j)  ■  12.8  for  amplitudes,  the  requirement 
becomes 

8X110  <  2- WEIGHT (j)  "  25.6  “  (-39) (.50), 

and  RATIO  must  therefore  be  within  39%  of  RATLIM(j),  as  opposed  to  25%  in 

CLOS1 . 

For  zero-crossings,  WEIGHT(J)  ■  24.0.  The  average  value  of  RATLlM(j)  for  all 
three  zero-crossing  ranges  is 

.24  +  .24  +  .32  .8  4 

3  3  “  15 

We  then  have  that  in  this  case, 

|  -  FACT- WEIGHT(j)- RATIO  >  0 

C  1£ 

RAII°  <  dhoffci)  •  w  ■  (-39)<b-> • 

In  order  for  corresponding  zero-crossings  for  two  P-segments  to  be  considered 
close,  RATIO  must  therefore  be  within  39%  of  the  average  value  of  RATLIM(j) 
for  all  three  zero-crossing  ranges. 

3.  1  FUNCTION  VARFUN 

The  purpose  of  VARFUN  is  to  flag  the  most  variable  of  the  parameters  Al(i), 
Zl(i),  A2(i),  Z2(i),  A3(i),  Z3(i)  for  a  given  P-segment  P(i)  in  the  sense  of 
.ne  definition  giren  below.  The  sole  input  variable  to  VARFUN  is  the  row 


r 

L 
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number  i  of  the  given  P-segment . 
seven  values: 

j  1  indicates 
|  2  Indicates 
1  3  Indicates 

VARFUN  -  <  4  indicates 

5  Indicates 

6  indicates 
0  Indicates 

^  A3(i),  or 

VARFUN  begins  by  trying  to  Isolate 
follows:* 

Z3(i)  2:  40  and  AlMX(i)  ^ 
If,  in  addition  to  the  above  three 
Z3MN(i)  <  30, 

then  Z3(i)  is  called  most  variable 
VARFUN  -  6 

and  return.  However,  if 
Z3MN(i)  *  30  , 

we  set 

VARFUN  -  0 
and  return. 


VARF’N  returns  with  one  of  the  following 

that  Al(i)  is  most  variable 

that  Zl(i)  is  most  variable 

that  A2(l)  is  most  variable 

that  Z2  (1)  is  most  variable 

that  A3(l)  is  most  variable 

that  Z3(l)  is  most  variable 

that  none  of  Al(i),  Zl(i),  A2(i),  Z2(i), 

Z3(i)  is  most  variable. 

a  fricative  type  "S”,  characterized  as 

,  and  A3MX(i)  ;?  AlMX(i). 

tests,  we  also  have  that 

and  we  set 


If  a  fricative  type  "S"  cannot  be  Identified,  we  proceed  by  defining  a  function 
called  VARLIM  depend jnt  upon  the  duration  DUR(i)  of  the  segment: 


VARLIM  - 


3  if  DUR(i)  <  6, 

4  _  DU|iil  lf  6  <;  DUR(i)  <  12, 

2  +  lf  DUR(i)  *  12  • 


Note  that  this  test  for  a  fricative  type 


differs  from  that  used  in  CL0S1. 
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The  test  for  variability  results  from  calculations  performed  serially  on  the 
six  parameters  Al(i),  Zl(i),  A2(i),  Z2(i),  A3(i),  and  Z 3 ( i ) .  Table  13  contains 
the  values  of  the  variables  XLIMl(j)  and  FACT(J)  used  in  the  computations: 

Table  13.  Parameters  for  VARFUN 


J 

XLIMl(j) 

FACTO) 

l 

6.0 

1.75 

2 

2.0 

2.0 

3 

4.0 

1.75 

4 

4.0 

2.0 

5 

4.0 

1.25 

6 

10.0 

1.25 

The  calculations  are  the  same  for  each  of  the  six  parameters,  with  appropriate 
values  of  XLIMl(j)  and  FACT(j)  used;  these  computations  will  be  illustrated  by 
those  for  A1 (i) : 


Define 

VI  -  max  { AlMN(i) ,  XLIMl(l)} 
and 

V2  -  max  {AlMX(i),  XLIMl(l)}. 


If 

|  VI  -  V2  |  s  XLIMl(l), 

then  the  parameter  Al(i)  is  considered  to  be  not  variable,  and  we  begin  with 
the  same  tests  for  Zl(i)  and  continue  until  we  find  Vl  and  V2  and  a  j  such 
that 


VI  -  V2  |  >  XLIMl(j). 
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If  no  such  VI,  V2,  and  J  can  be  found,  VARFUF  ■  0,  and  we  return.  Otherwise, 
«et 


If 


FACT1 


.75  if  VI  +  V2  <  10.0, 
1.0  otherwise  . 


VI  +  V2 
I VI  -  V2 | 


2:  VARLIM  •  FACT1  •  FACT  (1) , 


then  the  parameter  Al(i)  is  considered  to  be  not  variable.  Otherwise,  the 
ratio 


VI  -  V/ |- VARLIM  FACT1- FACT (1) 
VI  +  V2 


is  saved,  along  with  the  parameter  number  (1  for  Al(l)).  After  performing  these 
calculations  for  all  six  parameters,  we  find  the  largest  of  the  stored  ratios, 
and  VARFUN  is  set  equal  to  the  corresponding  parameter  number. 


Extensive  computations  v^ere  performed  in  attempting  to  identify  the  origin  of 
the  values  FACT(J)  and  XLIMl(J).  These  calculations  consisted  of  averaging 
the  function 

VI  +  V2 
I VI  -  V2| 

over  various  ranges  of  VI  and  V2,  and  combining  the  averages  with  various 
values  of  VARLIM  for  different  durations.  No  correlations  have  yet  been 
found. 


3.8  PSEUDO-SUBROUTINES  USED  BY  THE  SEGMENTATION  PROGRAM 

The  following  routines  are  used  by  the  segmentation  program  as  internal  sub¬ 
routines  by  assigning  return  addresses  from  one  routine  to  the  next  and 
eventually  back  to  segmentation. 
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1.8.1  Pseudo-Subroutine  SEARCH 

The  purpose  of  SEARCH  Is  to  find  and  create  P-matrix  segments  from  the  Q-matrix 
segments  beginning  with  Q(IS1)  and  continuing  and  including  Q(IS2).*  The 
segments  are  created  on  the  basis  of  the  closeness  values  of  the  Q-segments. 
Sustained  segments  consist  of  a  string  of  minimal  adjacent  segments  having  a 
positive  closeness  value  and  include  as  their  first  minimal  segment  the 
previously  adjacent  minimal  segment  with  a  negative  closeness  value  as  discussed 
earlier.  All  other  minimal  segments  not  a  part  of  sustained  segments  become 
grouped  into  transitional  segments.  SEARCH  calls  the  CREAT1  pseudo- subroutine 
to  build  the  P-matrix  row.  The  exit  from  SEARCH  is  to  the  return  address 
RETSEA. 


3.8.2  Psuedo-subroutines  CREATl,  CREAT2,  and  CREAT4 

The  CREAT  pseudo-subroutines  are  used  to  compute  ;he  values  for  a  P-matrix 
row. 


3. 8. 2.1  CREATl 

The  inputs  to  CREATl  are: 

SEND  •  the  number  of  the  ending  Q-segment  of  the  P-matrix  row, 

LENSEG  ■  the  number  of  Q-matrix  segments  in  the  P-matrix  row,  and 
TYPE  -  either  TR  or  RUST. 

Pseudo-sub routine  SEARCH  defines  a  set  of  Q-segments  to  be  combined  into  one 
P-segment  on  the  basis  of  tue  closeness  values  between  these  Q-segments. 

SEARCH  then  calls  CREATl,  which  allocates  the  appropriate  P-matrix  row  by 
assigning  a  P-matrix  row  number  to  SEGNB1  from  the  AVALBL  table  and  incrementing 


IS1  and  IS2  are  defined  in  the  segmentation  routine  before  entering  SEARCH; 
IS1  is  the  initial  Q-matrix  segment  number  and  IS2  is  the  final  Q-matrix 
segment  number  which  define  the  boundaries  of  the  Q-matrix  segments  to  be 
combined  into  P-matrix  segments. 
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SEGNB2  (which  Is  the  logical  row  number  of  the  P-segment).  It  must  be  noted, 
hcvever,  that  SEGNB2  Is  initialized  to  zero  by  the  SEGMENT  subroutine  prior  to 
'.he  first  call  to  SEARCH.  Then  compute  SBEG  ■  SEND  -  LENSEG. 

The  minimum,  average,  and  maximum  values  for  each  of  the  parameters  A1Q(  ), 

Z1Q(  ),  A2Q(  ),  z2Q(  )»  A3Q(  ),  and  Z3Q(  )  are  computed  from  0(SBEG  +1)  through 
Q(SENDJ  as  discussed  earlier  In  the  construction  of  the  P-matrix. 

The  following  variables  are  then  set: 

NAT (SEGNB1)  -  TYPE  (either  "TR"  or  "SUST”) 

INUSE (SEGNB2)  ■  SEGNB1  (the  physical  P-matrix  row  number) 

BPT (SEGNB1)  ■  SEGNB2  (the  logical  P-matrix  row  number) 

SBG (SECNB1)  -  SBEG 
SND(SEGNBl)  -  SEND 
DUR(SEGNBl)  -  LENSEG 

CREAT1  exits  through  the  RETCRE  return  address. 

3. 8. 2. 2  CREAT2 

The  inputs  to  CREAT2  are: 

SEND  ■  the  number  of  the  ending  Q-segment  of  the  P-matrix  row 
SBEG  ■  the  number  of  the  beginning  (Q-segment)  -1  of  the  P-matrix  row 
SEGNBl  ■  the  number  of  the  P-matrix  row  for  which  the  P-matrix  parameters 
are  to  be  recompiled 

SEGNB2  ■  the  logical  row  number  of  the  P-segment 
TYPE  -  either  "TR"  or  "SUST." 

i FNSFC-2 

CREAT2  begins  by  computing  LENSEG  -  SEND-SBEG  and  BETA  -  *  ■ 

This  means  that: 

BETA  -  0  if  0  s  LENSEG  <  4 

BETA  ■  1  if  5  <  LENSEG  S  7 

BETA  -  2  if  8  S  LENSEG  S  10 
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3.8.1  Pseudo-Subroutine  SEARCH 

The  purpose  of  SEARCH  Is  to  find  and  create  P-matrix  segments  from  the  Q-matrlx 
segments  beginning  with  Q(IS1)  and  continuing  and  including  Q(IS2).*  The 
segments  are  created  on  the  basis  of  the  closeness  values  of  the  Q-segments. 
Sustained  segments  consist  of  a  string  of  minimal  adjacent  segments  having  a 
positive  closeness  value  and  include  as  their  first  minimal  segment  the 
previously  adjacent  minimal  segment  with  a  negative  closeness  value  as  discussed 
earlier.  All  other  minimal  segments  not  a  part  of  sustained  segments  become 
grouped  into  transitional  segments.  SEARCH  calls  the  CREAT1  pseudo- subroutine 
to  build  the  P-matrix  row.  The  exit  from  SEARCH  is  to  the  return  address 
RET SEA. 


3.8.2  Psuedo-subroutines  CREAT1,  CREAT2,  and  CREAT4 

The  CREAT  pseudo-subroutines  are  used  to  compute  the  values  for  a  P-matrix 

row. 


3. 8. 2.1  CREAT 1 

The  inputs  to  CREAT1  are: 

SEND  -  the  number  of  the  ending  Q-segment  of  the  P-matrix  row, 
lENSEG  ■  the  number  of  Q-matrix  segments  in  the  P-matrix  row,  and 
TYPE  -  either  TR  or  SUST. 

Pseudo-subroutine  SEARCH  defines  a  set  of  Q-segments  to  be  combined  into  one 
P-segment  on  the  basis  of  the  closeness  values  between  these  Q-segments. 

SEARCH  then  calls  CREAT1,  which  allocates  the  appropriate  P-matrix  row  by 
assigning  a  P-matrix  row  number  to  SEGNB1  from  the  AVALBL  table  and  incrementing 


IS1  and  IS2  are  defined  in  the  segmentation  routine  before  entering  SEARCH; 
IS1  is  the  initial  Q-matrix  segment  number  and  IS2  is  the  final  Q-matrix 
segment  number  which  define  the  boundaries  of  the  Q-matrix  segments  to  be 
combined  into  P-matrix  segments. 
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SEGNB2  (which  Is  the  logical  row  number  of  the  P-segment).  It  must  be  noted, 
however,  that  SEGNB2  Is  Initialized  to  zero  by  the  SEGMENT  subroutine  prior  to 
the  first  call  to  SEARCH.  Then  compute  SBEG  ■  SEND  -  LENSEG. 

The  minimum,  average*  and  maximum  values  for  each  of  the  parameters  A1Q(  ), 

Z1Q(  ),  A2Q(  ),  z2Q(  ),  A3Q(  ),  and  Z3Q(  )  are  computed  from  0(SBEG  +1)  through 
Q(SEND}  as  discussed  earlier  In  the  construction  of  the  P-matrix. 

The  following  variables  are  then  set: 

NAT (SEGNB1)  -  TYPE  (either  "TR"  or  "SUST") 

INUSE (SEGNB2)  •*  SEGNBl  (the  physical  P-matrix  row  number) 

BPT(SEGNBl)  ■  SEGNB2  (the  logical  P-matrix  row  number) 

SBG (SEGNBl)  -  SBEG 
SND (SEGNBl)  -  SEND 
DUR (SEGNBl)  -  LENSEG 

CREAT1  exits  through  the  RETCRE  return  address. 

3. 8. 2. 2  CREAT2 

The  Inputs  to  CREAT2  are: 

SEND  ■  the  number  of  the  ending  Q-segment  of  the  P-matrix  row 
SBEG  ■  the  number  of  the  beginning  (Q-segment)  -1  of  the  P-matrix  row 
SEGNBl  ■  the  number  of  the  P-matrix  row  for  which  the  P-matrix  parameters 
are  to  be  recompiled 

SEGNB2  *  the  logical  row  number  of  the  P-segment 
TYPE  -  either  "TR"  or  "SUST." 

CREAT2  begins  by  computing  LENSEG  -  SEND-SBEG  and  BETA  -  LENSEG- 2 

This  means  that: 

BETA  -  0  if  0  £  LENSEG  £  4 

BETA  -  1  if  5  S  LENSEG  S  7 

BETA  -  2  if  8  i  LENSEG  S  10 
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BETA  -  3  if  11  £  LENSEG  £  13 
BETA  -  4  if  14  S  LENSEG  <  16 

etc. 

CREAT2  allocates  the  P-matrix  row  by  assigning  a  P-matrix  row  number  to 
SEGNB1  from  the  AVALBL  table  and  incrementing  SEGNB2. 

The  minimum,  average,  and  maximum  values  for  each  of  the  parameters  A10(  ), 

Z1Q(  ),  A2Q(  ),  Z2Q(  ),  A3Q(  ),  and  Z3Q(  )  are  computed  from  segment  Q(SBEG+1+ 
BETA)  to  Q(SEND-BETA) .  Thus,  the  parameters  are  computed  on  the  basis  of 
values  in  the  inner  Q-segments  and  not  for  the  total  range  of  the  Q-segments. 

The  rationale  behind  this  has  not  been  documented,  but  we  guess  one  of  the 

reasons  to  be  a  by-ptcduct  of  considering  transitional  segments  to  be  null 

segments  and  extending  them  onto  surrounding  sustained  segments.  The  duration 
of  the  new  sustained  segment  consists  of  the  duration  of  the  transitional 
segment  plus  that  of  the  old  sustained  segment.  All  of  the  parameter  values 
of  the  transitional  segment,  if  included  in  the  computed  representative  values 
for  the  new  sustained  P-segment,  might  degrade  the  purer  P-segment  values 
computed  only  on  the  basis  of  the  inner  segment.  Thus,  in  the  case  of  P-matrix 

rows  computed  by  CREAT2  or  CREAT4,  the  average  parameter  values  do  not  represent 

the  average  over  the  duration  of  the  entire  segment.  We  then  compute: 
NAT(SEGNBl)  -  TYPI  (either  "TR"  or  "SUST") 

INUSE(SEGNB2)  ■  SEGNB1  (the  physical  P-matri>  row  number) 

BPT(SEGNBl)  ■  SEGNB2  (the  logical  P-matrix  row  number) 

SBG(SEGNBl)  -*  SBEC- 
SND(SEGNBl)  -  SEND 
DUR(SEGNBl)  -  LENSEG 

CREAT2  exits  through  the  RETCRE  return  address. 
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3. 8. 2. 3  CREAT4 

The  inputs  to  CREAT4  are: 

SEND  ■  the  number  of  the  ending  Q-segment  of  the  P-matrix  row 

SBEG  -  the  number  of  the  beginning  (Q -segment)  -1  of  the  P-matrix  row 

SEGNB1  -  the  number  of  the  P-matrix  row  for  which  the  P-matrix  parameters 

are  to  be  computed 

SEGNB2  ■  the  logical  row  number  of  the  P-segment 
TYPE  -  either  "TR"  or  "SUST” 

LENS EG  -  either  the  length  of  the  segment  (in  terms  of  minimal  segments) 
or  the  segmentation  subroutine  sets  LENSEC  -  5  for  segments  less 
than  50  msec  long. 

LENS EG- 2 

CREAT4  begins  by  computing  BETA  -  - — ^ - and  then  proceeds  to  compute  the 

minimum,  average,  and  maximum  parameter  values  as  per  CREAT2. 

3.8.3  Pseudo-Subroutine  COMPAC 

The  pseudo-subroutine  COMPAC  is  used  to  consolidate  or  pack  the  INUSE (  )  table 

and  to  reassign  BPT (  )  values.  No  further  explanation  will  be  given  other  than 
to  note  that  BPT(INUSE(i))  is  the  logical  row  number  for  the  physical  INUSE(i) 
P-segment . 

3.8.4  Pseudo-Subroutine  SORT 

The  pseudo-subroutine  SORT  sorts  the  INUSE(  )  table  into  logical  P-segment 
order  (i.e.,  INUSE(i)  points  to  the  P-segment  having  the  lowest  beginning 
Q-segment  number,  INUSE(2)  points  to  the  P-segment  having  the  next  higher 
beginning  Q-segment  number,  etc.  The  BPT(  )  table  is  reset  so  that  the  BPT(  ) 
entry  corresponding  to  the  physical  P-row  number  for  the  last  logical  P-row 
points  to  the  logical  row  SIZEP. 

3.8.5  Pseudo-Subroutine  REORD 

The  pseudo-subroutine  REORD  actually  moves  the  physical  P-matrix  rows  into  the 
proper  logical  order,  i.e.,  physical  row  1  is  logical  row  1,  etc. 
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3.8.6  Pseudo-Subroutine  MINMAX 

The  pseudo-subroutine  MINMAX  determines  which  P-segments  are  local  maxima  or 
minima.  A  complete  discussion  of  the  MINMAX  function  can  be  found  in  Section 
3.4.3  on  secondary  segmentation. 

3.8.7  Pseudo -Subroutine  SUPNOI 

The  pseudo-subroutine  SUPNOI  is  used  to  suppress  the  noisy  segments  at  the 
beginning  and  the  end  of  the  speech  sample.  Noisy  segments  are  defined  as  being 
those  adjacent  segments  from  the  beginning  minimal  segment  forward  for  which 
AIMX(i)  <  8  or  Z3MX(i)  s  40;  or  those  adjacent  segments  from  the  ending  minimal 
segment  backwards  for  which  AlMX(i)  <  8  or  Z3MX(i)  £  40. 

3.9  CONTROL  OF  THE  P-MATRIX * 

Ideally  the  P-matrix  consists  of  physical  rows  that  are  in  logical  segment  order. 
This  is  true  for  primary  segmentation.  However,  whenever  a  segment  is  broken 
into  multiple  segments  or  two  segments  are  combined  Into  one  segment,  the  P-matrix 
would  have  to  be  physically  sorted  to  maintain  a  physical  order  corresponding  to 
its  logical  order.  A  logical  table  pointing  to  physical  rows  is  used  to  conserve 
time. 

During  primary  segmentation,  we  3et  SEGNB2  *  0  (the  logical  row  «•  .aber  or  t>i 
count  of  the  segments  in  the  P-matrix).  The  AVALBL  table  is  used  to  assign 
physical  row  numbers  and  is  initialized  as  follows: 

AVALBL (1)  -  2 
AVALBL (2)  -  2 
AVALBL (3)  -  3 
AVALBL(i)  -  i 


AVALBL (202)  -  202. 

*This  section  will  be  of  interest  mainly  to  programmers. 
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AVALBL(l)  points  to  the  current  AVALBL(  )  clot  (i.e.,  AVALBL(AVALBL(1))  which 
contains  the  physical  P-matrix  row  number  to  assign  to  the  logical  P-matrix 
row.  When  pseudo-subroutine  SEARCH  calls  pseudo-subroutine  CREAT1  to  create 
a  logical  row,  SEGNB1  (which  is  the  physical  row  number)  is  set  equal  to 
AVALBL(AVALBL(1)),  and  AVALBL(l)  is  incremented  by  one.  SEGNB2  (which  is  the 
logical  row  number)  is  incremented  by  one  also.  CREAT1  sets  up  the  P-matrix 
row  and  the  INUSE (  )  indicator. 

The  controls  o£  the  subscriptf  of  the  rows  are  established  as  follows: 

The  physical  row  BPT(SEGNBl)  points  to  the  logical  row  number  SEGNB2,  and 
the  logical  row  INUSE (SEGNE2)  points  to  the  physical  row  number  SEGNB1.  At 
the  completion  of  primary  segmentation,  SIZEP  -  SEGNB2,  the  count  of  the 
number  of  logical  rows  in  the  P-matrix. 

In  order  to  combine  two  segments  P(i)  and  P(i+1),  the  following  is  done: 
AVALBL(l)  -  AVALBL(1)-1 

avalbl(avalb: (l))  -  i+i 

INUSE (i+x)  -  0. 

In  order  to  break  up  one  segment  P(i)  into  two  segments,  the  following  is 
done: 

AVALBL(l)  -  AVALBL(1)-1 
AW>  LBL  ( AVALBL  ( 1 )  )  -  i 
INUSE (i)  -  0. 
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SEARCH  AND  CREAT  are  then  called  to  make  up  two  new  segments  on  the  basis  of 
closeness  values  computed  by  function  CL0S1  or  PR0XIM.  The  following 
calculations  are  then  performed : 

INUSE(SIZEP)  -  AVALBL (AVALBL (1) ) 

BPT ( AVALBL ( AVALBL ( 1 ) ) )  -  SIZEP 

AVALBL (1)  -  AVALBL (1)+1 

INUSE (S1ZEP+1)  -  AV/  ',BL  (AVALBL  (1)) 

BPT (AVALBL (AVALBL (1)))  -  SIZEP+1 
AVALBL (1)  -  AVALBL (1)+1 

The  COMP AC  pseudo-subroutine  Is  called  to  compress  the  INUSE (  )  table.  It 
eliminates  the  zero  INUSE (  )  entries.  The  SORT  pseudo-subroutine  Is  called  to 
sort  the  INUSE (  )  table  Into  logical  order.  The  REORD  pseudo-subroutine  is 
called  to  actually  move  P-matrix  rows  Into  logical  order  and  reassign  INUSE(  ) 
values.  After  reordering,  P(2)  Is  logical  row  1,  P(3)  Is  logical  row  2,  etc. 


C 


4  December  1970 


59 


System  Development  Corporation 
TM-4652/200  00 


APPENDIX 


Let  {f^}  for  1*1,  . ..,  n  be  a  discrete  function  representing  an  arbitrary 
wave  within  a  minimal  segment.  The  amplitude  of  the  wave  on  the  minimal 
segment  was  defined  to  be 


max 

lSi*n 


min  |f  J  . 

lsisn  ' 


W«.  shall  show  that  for  sine  waves,  this  expression  is  proportional  to  the 
square  root  of  the  average  power  of  the  signal  over  the  minimal  segment. 


The  average  power  of  a  continuous  signal  f(t)  is  defined  in  [5]  to  be 

/T  |f(t)i2dt. 

1  -T 

In  the  case  when  f(t)  is  discrete,  the  interval  [-T,T]  can  be  decomposed  into 
subintervals: 


such  that 


-T«t  <t  ,,  <...<t<...<t  ,<t  -  T, 

-n  -n+1  o  n-1  n  * 


f(t)  -  f  j  for  tj  <  t  *  tjTl  (j --n,  ...,-l) 


and 


f(t)  "  fj+1  fov  tj  <  t  ^  tJ+1  (j-0,1, 


•  • • } n-1 ) • 
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If  the  points  {t^}  are  evenly  spaced,  say 


where  h  >  0,  then 


Therefore, 


t  j  "  tQ  +  Jh  (  j  *  ~n, . . .  ,0, . . .  ,n), 


_1_  .  1  _  1  _  1 

2T  tn  -  t_n  "  tQ  +nh  -  (tQ-nh)  "  2nh 


«  /  |£(t)i2dt  ■  sk  ,1  i£/  • h  ■  t  £n  i£/> 

J+O  jfo 


and  so 

*(t)-i2s£ifJ|8' 

In  the  case  when  the  discrete  signal  represents  a  sine  wave,  we  may  take 

f .  ■  A  tin  u*  and  t  ■  0- 

j  J  ° 

Assume  that  tR  -  t_^  >  2tt/<b,  so  that  [f^3  contains  at  least  one  full  cycle 

between  t  and  t  .  Then 
-n  n 

f_j  ■  A  sin  <ut_j  ■  A  sin  <u(-jh)  ■  -A  sin  u)jh  ■  -A  sin  (Utj  ■  -f  ^ . 


Hence , 


^(t) 


lim 
n-»  2n 


1 

r„E 
Jr:n 
Jfo 


i£/ 


-iS  if, 


J-l 
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To  evaluate  this  last  sum,  note  that 


Therefore, 


jfj|2  -  A2  sin2  u>jh 


.2/l  -  Los  2uo.1h\ 

\  2  1 

A2  A2  _  2;oJhi  , . 

-  “  -  —  Re  e  J  ,  (i  -  Z3!) 


|2-n^  -  4—  Re  f  e2u)jhi 


E  1*  r-.v  -f-Re  £ 

j-l  J  J-l 


A2  A2  j  2uhi  1  - 

nT._Re|e  - 


2nuhi 


»2  a2  (  /1  2nutiiv ,  -2uhi. 

A  A  ;  2ohi  (1  -  e  )(1  -  e  ) 

"T  2" Re  ie  ^ — r-2  co«  g«fi - 


A2  j  2uhi  (1 
—  Reje 


2nuhiw  -2uhi, 

l!  )(l~e  ) 

2-2  cos  2uh 


I  I  <  .aL  4  _  A2 

2  2  -  2  cos  2uh  1  -  cos  2uh 


const 


1  n 

i  £  Uji 

n  j-l  J 


2  A  e 

2  ”  n  ' 


Thus 
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But  since 


we  get  that 


max  If. j-  min  f  “A  -  ( -A)  ■  2A, 
ISiSn  '  1  UiSn  1 


max 

1  s i  fin 


min  If.  I  -  2/2  . 

1  £  i  fin  ' 
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